SlideShare a Scribd company logo
Open Source
Relational Databases
Open Source
Relational Databases
Open Source
Relational Databases
Who and what is about?
• Emanuel Calvo, currently at
OnGres as a PostgreSQL
Consultant and ayres.io as
_root_.

• Working on Modern
techniques for DBRE.

• What is the current status of
the Open Source SQL
databases per component?

• What’s the good, the bad and
the ugly in the market?
ER model
Entity-Relationship and why SQL isn’t considered so.

At least in its pure state.
The ER Map
• Needs a First-Order logic
language for retrieving data.

• Relational Algebra

• Tuple and Domain Relational
Calculus.
The model example
• Obscures everything behind
the complexity of the storage.

• It is represented as relational
algebra, but is hidden from
you.

• How to select the names of
the people of "Black" team?
Some SQL:2011 tangent
distinctions
• Support NULLs

• Support SubQueries

• Column precedence affects
(horizontal alignment)
depending on the engine

• SQL/MED
• Is a declarative language

• Hides all the complexity of the
executions to the end user

• Planners were very advanced
already.
The Transaction
Model
Concurrency, consistency and availability.
The Entity Consistency
• CAP Theorem (Consistency, Availability and Partition
Tolerance). PACELC adds to choose between [L]atency
and [C]onsistency.

• ACID (Atomicity, Consistency, Isolation and Durability)

• BASE (Basically Available, Soft State, Eventual
consistency)
The chosen
We grab them by the storage and use them 

wisely without paying money to Oracle.
• CockroachDB

• PostgreSQL

• MySQL / MariaDB

• Clickhouse

• MongoDB
Components
The Lego
• Storage Engine

• Planner

• Protocol

• Language

• Ecosystem

• Framework

• WAL

• Transaction Manager

• Source Code availability, documentation both user and internal,
community, etc.
• Buffer Management

• IO method (Direct/io, fsync)

• Transaction Management (storage
layer)

• Point in Time Recovery and Undo Log

• For distributed engines you want to 

read Jepsen tests.

• Is the sauce
Wide-range Storage Engine
Map
Columnar Based
Tuple Based
Leveled Structured 

Map Tree
Quick cherry pick
• Fast for aggregations

• Easy for parallelization

• Better compression due to ColBased

• Better to scale massive amount of data
• Bloom filters

• Sparse indexes by design

• Avoid Write Amplification

• Index-based storage

• More disk efficient, more CPU
• Better for concurrency 

• Hard to scale

• Better when manipulating entities

atomically

• Balance between performance and

concurrency.
– Jorge de Lanús Oeste (maneja Uber pero sabe mucho de Bases de Datos)
“Relational databases require a Query Optimizer/
Query Planner for translating the first-order logic
language to relational algebra and other
optimizations. The result is called Execution Plan.”
• Storage Engine

• Planner

• Protocol

• Language

• Ecosystem

• Framework

• WAL

• Transaction Manager

• Source Code availability, documentation both user and internal,
community, etc.
Plan: {

…
• Heuristic

• Cost based {Parametric, MO, MOP}

• Mixed

• Planner, Resolver, Opmitizer, Executor
• Storage Engine

• Planner

• Protocol

• Language

• Ecosystem

• Framework

• WAL

• Transaction Manager

• Source Code availability, documentation both user and internal,
community, etc.
Plan: {

…
• Heuristic

• Cost based {Parametric, MO, MOP}

• Mixed

• Planner, Resolver, Opmitizer, Executor
• MySQL has also Condition Pushdown

• PostgreSQL has a rich planner

• MySQL plan information lacks of 

information

• PostgreSQL does not provide additional

tools for plan reading.
• Storage Engine

• Planner

• Protocol

• Language

• Ecosystem

• Framework

• WAL

• Transaction Manager

• Source Code availability, documentation both user and internal,
community, etc.
• Client Protocol

• Replication Protocol

• Logical/Binary

• Coordination Protocol

• HA protocol

• Gossip

• Consensus {RAFT, Paxos}

• …
• Storage Engine

• Planner

• Protocol

• Language

• Ecosystem

• Framework

• WAL

• Source Code availability, documentation both user and
internal, community, etc.
• Client Protocol

• Replication Protocol

• Coordination Protocol

• HA protocol

• Gossip

• Consensus {RAFT, paxos}

• …
• No standard

• JSON is becoming more present

(thankfully)

• Absence of internal consensus
• Storage Engine

• Planner

• Protocol

• Language

• Ecosystem

• Framework

• WAL

• Transaction Manager

• Source Code availability, documentation both user and internal,
community, etc.
• Abstract all relation algebra

• SQL != Relational

• NULLs

• Column Alignment

• Subquery

• Mixed implementations

• Relational is conceptually unable to

return more than 1 result set.
• Storage Engine

• Planner

• Protocol

• Language

• Ecosystem

• Framework

• WAL

• Transaction Manager

• Source Code availability, documentation both user and internal,
community, etc.
• Abstract all relation algebra

• SQL != Relational

• NULLs

• Column Alignment

• Subquery

• Mixed implementations

• Relational is conceptually unable to

return more than 1 result set.
• Standard

• Backward Compatibility

• Modern
What do
we want?
Postgres95 -> PostgreSQL
“Postgres original implementation was in QUEL and
its organization resembles to many of the concepts
of the original ER model. COPY is a inherited piece
from this prior implementation.”
• Storage Engine

• Planner

• Protocol

• Language

• Ecosystem
• Framework

• WAL

• Transaction Manager

• Source Code availability, documentation both user and internal,
community, etc.
• Single Provider or fake Open source

• Community contribution or 

Social Entropy Experiment

• Satellite companies building tools

• Satellite companies building forks

• Satellite coders copy pasting 

• Tons of under-proven libraries
• Storage Engine

• Planner

• Protocol

• Language

• Ecosystem
• Framework

• WAL

• Transaction Manager

• Source Code availability, documentation both user and internal,
community, etc.
• Multi-database tools tend to fail 

awesomely

• Choose tools that are integrated with

the core and that have frequent updates

• Bug fixing tied to community times

• bugs.mysql.com

• Postgres uses mailing list 

• Clickhouse/Cockroach use GH
• Storage Engine

• Planner

• Protocol

• Language

• Ecosystem

• Framework
• WAL

• Transaction Manager

• Source Code availability, documentation both user and internal,
community, etc.
• Core extensibility plugins or extensions

• Customize Planner

• Manage protocol

• Creating workers

• Creating own types
• Storage Engine

• Planner

• Protocol

• Language

• Ecosystem

• Framework
• WAL

• Transaction Manager

• Source Code availability, documentation both user and internal,
community, etc.
• Complex, generally in C.

• Multi-provider packages.
• Storage Engine

• Planner

• Protocol

• Language

• Ecosystem

• Framework

• WAL

• Transaction Manager

• Source Code availability, documentation both user and internal,
community, etc.
• WAL or Redo

• MySQL has undo log, but only for

rollback space.

• Postgres has extensions for rewind

(pg_rewind)

• It can reside on the Storage Engine

or higher layers

• It’s local and provides consistency and

durability

• Distributed WALs or Certification log could

be in this group, although there will be 

always a WAL.
• Storage Engine

• Planner

• Protocol

• Language

• Ecosystem

• Framework

• WAL

• Transaction Manager

• Source Code availability, documentation both user and internal,
community, etc.
• It can be at node level or cluster level

• Concept of source and origin

• Group Replication

• Logical Replication

• Concept of Global Id

• Centralized Commits are possible through 

Kafka brokers

• Functional sharing must relay on node

try level

• Serializable only supported by Postgres

• Uncommitted only supported by InnoDB
Other components or
capabilities
• Access Methods (B-Tree, L-Tree, Reverse, Hash)

• FTS (Full Text Search) and advanced search

• Geo capabilities
Entity Consistency
at Scale
Replication, Sharding and HA.
What is in the land of single
leader engines?
• Async

• Semi-synchronous replication

• First node response, as in MySQL.

• Simple Synchronous replication

• Quorum Synchronous

• Postgres
What is in the land of distributed/
multileader[less] engines?
• Asynchronous Multi Leader replication

• BDR

• Snapshot Isolation

• Galera (MySQL layer on top InnoDB)

• Serializability

• CockroachDB (2PC to a consensus group, with Hybrid Logical Clock, not strict serial)

• VoltDB

• External consistency

• Google Spanner (through True Time clocks).
The [full] architecture
Service Check HTTP
Replication Worker /

Certification /

Tx Coordination
Client Worker
Internal Pooling /

Thread Management /

Process per worker
External Pooling
Executor
• Write Quorum

• Single Leader

• Multi Leader

• Group Replication

• Inter node coordination

• Distributed Transactions

• Conflict-Free Replicated

Datatype (LWW, 2PC set, 

etc)
• Consensus for HA

• Also in the entry points if

external
• Centralized Commit
The status of horizontal
scalability in OSDBs
• Non native support for distributed consensus.

• Only MySQL has Global identifiers and recently supported
Group Replication.

• There are extensions/forks for providing sharding in
Postgres and MySQL.
SandBox
• https://gitlab.com/3manuek/HA_PoC

• https://gitlab.com/ongresinc/testing-pg-ha-solutions
References
• Designing Data-Intensive Applications (Martin
Kleppmann)

• Database Reliability Engineering (L. Campbell/C. Majors)
Thank you!
@3manuek

3manuek [at] gmail {dot} com

More Related Content

What's hot

Spotify: Horizontal Scalability for Great Success
Spotify: Horizontal Scalability for Great SuccessSpotify: Horizontal Scalability for Great Success
Spotify: Horizontal Scalability for Great Success
Nick Barkas
 
Serialization and performance by Sergey Morenets
Serialization and performance by Sergey MorenetsSerialization and performance by Sergey Morenets
Serialization and performance by Sergey Morenets
Alex Tumanoff
 
Ballerina- A programming language for the networked world
Ballerina- A programming language for the networked worldBallerina- A programming language for the networked world
Ballerina- A programming language for the networked world
Asangi Jasenthuliyana
 
Stardog 1.1: An Easier, Smarter, Faster RDF Database
Stardog 1.1: An Easier, Smarter, Faster RDF DatabaseStardog 1.1: An Easier, Smarter, Faster RDF Database
Stardog 1.1: An Easier, Smarter, Faster RDF Database
kendallclark
 
Hive & HBase For Transaction Processing
Hive & HBase For Transaction ProcessingHive & HBase For Transaction Processing
Hive & HBase For Transaction Processing
DataWorks Summit
 
Apache Content Technologies
Apache Content TechnologiesApache Content Technologies
Apache Content Technologies
gagravarr
 
Pulsar Summit Asia - Structured Data Stream with Apache Pulsar
Pulsar Summit Asia - Structured Data Stream with Apache PulsarPulsar Summit Asia - Structured Data Stream with Apache Pulsar
Pulsar Summit Asia - Structured Data Stream with Apache Pulsar
Shivji Kumar Jha
 
Enabling real interactive BI on Hadoop
Enabling real interactive BI on HadoopEnabling real interactive BI on Hadoop
Enabling real interactive BI on Hadoop
DataWorks Summit
 
lessons from managing a pulsar cluster
 lessons from managing a pulsar cluster lessons from managing a pulsar cluster
lessons from managing a pulsar cluster
Shivji Kumar Jha
 
Dancing with the elephant h base1_final
Dancing with the elephant   h base1_finalDancing with the elephant   h base1_final
Dancing with the elephant h base1_final
asterix_smartplatf
 
HBase and Impala Notes - Munich HUG - 20131017
HBase and Impala Notes - Munich HUG - 20131017HBase and Impala Notes - Munich HUG - 20131017
HBase and Impala Notes - Munich HUG - 20131017
larsgeorge
 
Serialization (Avro, Message Pack, Kryo)
Serialization (Avro, Message Pack, Kryo)Serialization (Avro, Message Pack, Kryo)
Serialization (Avro, Message Pack, Kryo)
오석 한
 
Rr embedded systems linux system programming and kernel internals
Rr embedded systems   linux system programming and kernel internalsRr embedded systems   linux system programming and kernel internals
Rr embedded systems linux system programming and kernel internals
Shailaja Gadagoju
 
Embedded systems training India - Linux system programming and kernel intern...
Embedded systems training India  - Linux system programming and kernel intern...Embedded systems training India  - Linux system programming and kernel intern...
Embedded systems training India - Linux system programming and kernel intern...
RR Embedded
 
Feb 2013 HUG: Large Scale Data Ingest Using Apache Flume
Feb 2013 HUG: Large Scale Data Ingest Using Apache FlumeFeb 2013 HUG: Large Scale Data Ingest Using Apache Flume
Feb 2013 HUG: Large Scale Data Ingest Using Apache Flume
Yahoo Developer Network
 
Strata feb2013
Strata feb2013Strata feb2013
Strata feb2013
alanfgates
 
Practical Kerberos with Apache HBase
Practical Kerberos with Apache HBasePractical Kerberos with Apache HBase
Practical Kerberos with Apache HBase
Josh Elser
 
Apache con2016final
Apache con2016final Apache con2016final
Apache con2016final
Salesforce
 
HBaseCon 2012 | HBase Filtering - Lars George, Cloudera
HBaseCon 2012 | HBase Filtering - Lars George, ClouderaHBaseCon 2012 | HBase Filtering - Lars George, Cloudera
HBaseCon 2012 | HBase Filtering - Lars George, Cloudera
Cloudera, Inc.
 
De-Mystifying the Apache Phoenix QueryServer
De-Mystifying the Apache Phoenix QueryServerDe-Mystifying the Apache Phoenix QueryServer
De-Mystifying the Apache Phoenix QueryServer
Josh Elser
 

What's hot (20)

Spotify: Horizontal Scalability for Great Success
Spotify: Horizontal Scalability for Great SuccessSpotify: Horizontal Scalability for Great Success
Spotify: Horizontal Scalability for Great Success
 
Serialization and performance by Sergey Morenets
Serialization and performance by Sergey MorenetsSerialization and performance by Sergey Morenets
Serialization and performance by Sergey Morenets
 
Ballerina- A programming language for the networked world
Ballerina- A programming language for the networked worldBallerina- A programming language for the networked world
Ballerina- A programming language for the networked world
 
Stardog 1.1: An Easier, Smarter, Faster RDF Database
Stardog 1.1: An Easier, Smarter, Faster RDF DatabaseStardog 1.1: An Easier, Smarter, Faster RDF Database
Stardog 1.1: An Easier, Smarter, Faster RDF Database
 
Hive & HBase For Transaction Processing
Hive & HBase For Transaction ProcessingHive & HBase For Transaction Processing
Hive & HBase For Transaction Processing
 
Apache Content Technologies
Apache Content TechnologiesApache Content Technologies
Apache Content Technologies
 
Pulsar Summit Asia - Structured Data Stream with Apache Pulsar
Pulsar Summit Asia - Structured Data Stream with Apache PulsarPulsar Summit Asia - Structured Data Stream with Apache Pulsar
Pulsar Summit Asia - Structured Data Stream with Apache Pulsar
 
Enabling real interactive BI on Hadoop
Enabling real interactive BI on HadoopEnabling real interactive BI on Hadoop
Enabling real interactive BI on Hadoop
 
lessons from managing a pulsar cluster
 lessons from managing a pulsar cluster lessons from managing a pulsar cluster
lessons from managing a pulsar cluster
 
Dancing with the elephant h base1_final
Dancing with the elephant   h base1_finalDancing with the elephant   h base1_final
Dancing with the elephant h base1_final
 
HBase and Impala Notes - Munich HUG - 20131017
HBase and Impala Notes - Munich HUG - 20131017HBase and Impala Notes - Munich HUG - 20131017
HBase and Impala Notes - Munich HUG - 20131017
 
Serialization (Avro, Message Pack, Kryo)
Serialization (Avro, Message Pack, Kryo)Serialization (Avro, Message Pack, Kryo)
Serialization (Avro, Message Pack, Kryo)
 
Rr embedded systems linux system programming and kernel internals
Rr embedded systems   linux system programming and kernel internalsRr embedded systems   linux system programming and kernel internals
Rr embedded systems linux system programming and kernel internals
 
Embedded systems training India - Linux system programming and kernel intern...
Embedded systems training India  - Linux system programming and kernel intern...Embedded systems training India  - Linux system programming and kernel intern...
Embedded systems training India - Linux system programming and kernel intern...
 
Feb 2013 HUG: Large Scale Data Ingest Using Apache Flume
Feb 2013 HUG: Large Scale Data Ingest Using Apache FlumeFeb 2013 HUG: Large Scale Data Ingest Using Apache Flume
Feb 2013 HUG: Large Scale Data Ingest Using Apache Flume
 
Strata feb2013
Strata feb2013Strata feb2013
Strata feb2013
 
Practical Kerberos with Apache HBase
Practical Kerberos with Apache HBasePractical Kerberos with Apache HBase
Practical Kerberos with Apache HBase
 
Apache con2016final
Apache con2016final Apache con2016final
Apache con2016final
 
HBaseCon 2012 | HBase Filtering - Lars George, Cloudera
HBaseCon 2012 | HBase Filtering - Lars George, ClouderaHBaseCon 2012 | HBase Filtering - Lars George, Cloudera
HBaseCon 2012 | HBase Filtering - Lars George, Cloudera
 
De-Mystifying the Apache Phoenix QueryServer
De-Mystifying the Apache Phoenix QueryServerDe-Mystifying the Apache Phoenix QueryServer
De-Mystifying the Apache Phoenix QueryServer
 

Similar to Open Source SQL Databases

High Performance Systems in Go - GopherCon 2014
High Performance Systems in Go - GopherCon 2014High Performance Systems in Go - GopherCon 2014
High Performance Systems in Go - GopherCon 2014
Derek Collison
 
Dissecting Scalable Database Architectures
Dissecting Scalable Database ArchitecturesDissecting Scalable Database Architectures
Dissecting Scalable Database Architectures
hypertable
 
High-level languages for Big Data Analytics (Presentation)
High-level languages for Big Data Analytics (Presentation)High-level languages for Big Data Analytics (Presentation)
High-level languages for Big Data Analytics (Presentation)
Jose Luis Lopez Pino
 
Distributed Logging Architecture in Container Era
Distributed Logging Architecture in Container EraDistributed Logging Architecture in Container Era
Distributed Logging Architecture in Container Era
SATOSHI TAGOMORI
 
Distributed Logging Architecture in the Container Era
Distributed Logging Architecture in the Container EraDistributed Logging Architecture in the Container Era
Distributed Logging Architecture in the Container Era
Glenn Davis
 
Scalability, Availability & Stability Patterns
Scalability, Availability & Stability PatternsScalability, Availability & Stability Patterns
Scalability, Availability & Stability Patterns
Jonas Bonér
 
Presto: Fast SQL on Everything
Presto: Fast SQL on EverythingPresto: Fast SQL on Everything
Presto: Fast SQL on Everything
David Phillips
 
Select Stars: A DBA's Guide to Azure Cosmos DB (SQL Saturday Oslo 2018)
Select Stars: A DBA's Guide to Azure Cosmos DB (SQL Saturday Oslo 2018)Select Stars: A DBA's Guide to Azure Cosmos DB (SQL Saturday Oslo 2018)
Select Stars: A DBA's Guide to Azure Cosmos DB (SQL Saturday Oslo 2018)
Bob Pusateri
 
Select Stars: A DBA's Guide to Azure Cosmos DB (Chicago Suburban SQL Server U...
Select Stars: A DBA's Guide to Azure Cosmos DB (Chicago Suburban SQL Server U...Select Stars: A DBA's Guide to Azure Cosmos DB (Chicago Suburban SQL Server U...
Select Stars: A DBA's Guide to Azure Cosmos DB (Chicago Suburban SQL Server U...
Bob Pusateri
 
ROS - an open-source Robot Operating System
ROS - an open-source Robot Operating SystemROS - an open-source Robot Operating System
ROS - an open-source Robot Operating System
abirpahlwan
 
Infinispan, Data Grids, NoSQL, Cloud Storage and JSR 347
Infinispan, Data Grids, NoSQL, Cloud Storage and JSR 347Infinispan, Data Grids, NoSQL, Cloud Storage and JSR 347
Infinispan, Data Grids, NoSQL, Cloud Storage and JSR 347
Manik Surtani
 
Vitalii Bondarenko - “Azure real-time analytics and kappa architecture with K...
Vitalii Bondarenko - “Azure real-time analytics and kappa architecture with K...Vitalii Bondarenko - “Azure real-time analytics and kappa architecture with K...
Vitalii Bondarenko - “Azure real-time analytics and kappa architecture with K...
Lviv Startup Club
 
Modern software architectures - PHP UK Conference 2015
Modern software architectures - PHP UK Conference 2015Modern software architectures - PHP UK Conference 2015
Modern software architectures - PHP UK Conference 2015
Ricard Clau
 
Drill at the Chicago Hug
Drill at the Chicago HugDrill at the Chicago Hug
Drill at the Chicago Hug
MapR Technologies
 
Cool NoSQL on Azure with DocumentDB
Cool NoSQL on Azure with DocumentDBCool NoSQL on Azure with DocumentDB
Cool NoSQL on Azure with DocumentDB
Jan Hentschel
 
Apache Geode Meetup, London
Apache Geode Meetup, LondonApache Geode Meetup, London
Apache Geode Meetup, London
Apache Geode
 
HBase in Practice
HBase in Practice HBase in Practice
HBase in Practice
DataWorks Summit/Hadoop Summit
 
HBase in Practice
HBase in PracticeHBase in Practice
HBase in Practice
larsgeorge
 
An introduction to Pincaster
An introduction to PincasterAn introduction to Pincaster
An introduction to Pincaster
Frank Denis
 
Drill architecture 20120913
Drill architecture 20120913Drill architecture 20120913
Drill architecture 20120913
jasonfrantz
 

Similar to Open Source SQL Databases (20)

High Performance Systems in Go - GopherCon 2014
High Performance Systems in Go - GopherCon 2014High Performance Systems in Go - GopherCon 2014
High Performance Systems in Go - GopherCon 2014
 
Dissecting Scalable Database Architectures
Dissecting Scalable Database ArchitecturesDissecting Scalable Database Architectures
Dissecting Scalable Database Architectures
 
High-level languages for Big Data Analytics (Presentation)
High-level languages for Big Data Analytics (Presentation)High-level languages for Big Data Analytics (Presentation)
High-level languages for Big Data Analytics (Presentation)
 
Distributed Logging Architecture in Container Era
Distributed Logging Architecture in Container EraDistributed Logging Architecture in Container Era
Distributed Logging Architecture in Container Era
 
Distributed Logging Architecture in the Container Era
Distributed Logging Architecture in the Container EraDistributed Logging Architecture in the Container Era
Distributed Logging Architecture in the Container Era
 
Scalability, Availability & Stability Patterns
Scalability, Availability & Stability PatternsScalability, Availability & Stability Patterns
Scalability, Availability & Stability Patterns
 
Presto: Fast SQL on Everything
Presto: Fast SQL on EverythingPresto: Fast SQL on Everything
Presto: Fast SQL on Everything
 
Select Stars: A DBA's Guide to Azure Cosmos DB (SQL Saturday Oslo 2018)
Select Stars: A DBA's Guide to Azure Cosmos DB (SQL Saturday Oslo 2018)Select Stars: A DBA's Guide to Azure Cosmos DB (SQL Saturday Oslo 2018)
Select Stars: A DBA's Guide to Azure Cosmos DB (SQL Saturday Oslo 2018)
 
Select Stars: A DBA's Guide to Azure Cosmos DB (Chicago Suburban SQL Server U...
Select Stars: A DBA's Guide to Azure Cosmos DB (Chicago Suburban SQL Server U...Select Stars: A DBA's Guide to Azure Cosmos DB (Chicago Suburban SQL Server U...
Select Stars: A DBA's Guide to Azure Cosmos DB (Chicago Suburban SQL Server U...
 
ROS - an open-source Robot Operating System
ROS - an open-source Robot Operating SystemROS - an open-source Robot Operating System
ROS - an open-source Robot Operating System
 
Infinispan, Data Grids, NoSQL, Cloud Storage and JSR 347
Infinispan, Data Grids, NoSQL, Cloud Storage and JSR 347Infinispan, Data Grids, NoSQL, Cloud Storage and JSR 347
Infinispan, Data Grids, NoSQL, Cloud Storage and JSR 347
 
Vitalii Bondarenko - “Azure real-time analytics and kappa architecture with K...
Vitalii Bondarenko - “Azure real-time analytics and kappa architecture with K...Vitalii Bondarenko - “Azure real-time analytics and kappa architecture with K...
Vitalii Bondarenko - “Azure real-time analytics and kappa architecture with K...
 
Modern software architectures - PHP UK Conference 2015
Modern software architectures - PHP UK Conference 2015Modern software architectures - PHP UK Conference 2015
Modern software architectures - PHP UK Conference 2015
 
Drill at the Chicago Hug
Drill at the Chicago HugDrill at the Chicago Hug
Drill at the Chicago Hug
 
Cool NoSQL on Azure with DocumentDB
Cool NoSQL on Azure with DocumentDBCool NoSQL on Azure with DocumentDB
Cool NoSQL on Azure with DocumentDB
 
Apache Geode Meetup, London
Apache Geode Meetup, LondonApache Geode Meetup, London
Apache Geode Meetup, London
 
HBase in Practice
HBase in Practice HBase in Practice
HBase in Practice
 
HBase in Practice
HBase in PracticeHBase in Practice
HBase in Practice
 
An introduction to Pincaster
An introduction to PincasterAn introduction to Pincaster
An introduction to Pincaster
 
Drill architecture 20120913
Drill architecture 20120913Drill architecture 20120913
Drill architecture 20120913
 

More from Emanuel Calvo

Demystifying postgres logical replication percona live sc
Demystifying postgres logical replication percona live scDemystifying postgres logical replication percona live sc
Demystifying postgres logical replication percona live sc
Emanuel Calvo
 
Pgbr 2013 fts
Pgbr 2013 ftsPgbr 2013 fts
Pgbr 2013 fts
Emanuel Calvo
 
Pgbr 2013 postgres on aws
Pgbr 2013   postgres on awsPgbr 2013   postgres on aws
Pgbr 2013 postgres on aws
Emanuel Calvo
 
PostgreSQL and Sphinx pgcon 2013
PostgreSQL and Sphinx   pgcon 2013PostgreSQL and Sphinx   pgcon 2013
PostgreSQL and Sphinx pgcon 2013
Emanuel Calvo
 
PostgreSQL FTS Solutions FOSDEM 2013 - PGDAY
PostgreSQL FTS Solutions FOSDEM 2013 - PGDAYPostgreSQL FTS Solutions FOSDEM 2013 - PGDAY
PostgreSQL FTS Solutions FOSDEM 2013 - PGDAY
Emanuel Calvo
 
LSWC PostgreSQL 9.1 (2011)
LSWC PostgreSQL 9.1 (2011)LSWC PostgreSQL 9.1 (2011)
LSWC PostgreSQL 9.1 (2011)
Emanuel Calvo
 
Admon PG 1
Admon PG 1Admon PG 1
Admon PG 1
Emanuel Calvo
 
Monitoreo de MySQL y PostgreSQL con SQL
Monitoreo de MySQL y PostgreSQL con SQLMonitoreo de MySQL y PostgreSQL con SQL
Monitoreo de MySQL y PostgreSQL con SQL
Emanuel Calvo
 
Osol Pgsql
Osol PgsqlOsol Pgsql
Osol Pgsql
Emanuel Calvo
 

More from Emanuel Calvo (9)

Demystifying postgres logical replication percona live sc
Demystifying postgres logical replication percona live scDemystifying postgres logical replication percona live sc
Demystifying postgres logical replication percona live sc
 
Pgbr 2013 fts
Pgbr 2013 ftsPgbr 2013 fts
Pgbr 2013 fts
 
Pgbr 2013 postgres on aws
Pgbr 2013   postgres on awsPgbr 2013   postgres on aws
Pgbr 2013 postgres on aws
 
PostgreSQL and Sphinx pgcon 2013
PostgreSQL and Sphinx   pgcon 2013PostgreSQL and Sphinx   pgcon 2013
PostgreSQL and Sphinx pgcon 2013
 
PostgreSQL FTS Solutions FOSDEM 2013 - PGDAY
PostgreSQL FTS Solutions FOSDEM 2013 - PGDAYPostgreSQL FTS Solutions FOSDEM 2013 - PGDAY
PostgreSQL FTS Solutions FOSDEM 2013 - PGDAY
 
LSWC PostgreSQL 9.1 (2011)
LSWC PostgreSQL 9.1 (2011)LSWC PostgreSQL 9.1 (2011)
LSWC PostgreSQL 9.1 (2011)
 
Admon PG 1
Admon PG 1Admon PG 1
Admon PG 1
 
Monitoreo de MySQL y PostgreSQL con SQL
Monitoreo de MySQL y PostgreSQL con SQLMonitoreo de MySQL y PostgreSQL con SQL
Monitoreo de MySQL y PostgreSQL con SQL
 
Osol Pgsql
Osol PgsqlOsol Pgsql
Osol Pgsql
 

Recently uploaded

“Intel’s Approach to Operationalizing AI in the Manufacturing Sector,” a Pres...
“Intel’s Approach to Operationalizing AI in the Manufacturing Sector,” a Pres...“Intel’s Approach to Operationalizing AI in the Manufacturing Sector,” a Pres...
“Intel’s Approach to Operationalizing AI in the Manufacturing Sector,” a Pres...
Edge AI and Vision Alliance
 
INDIAN AIR FORCE FIGHTER PLANES LIST.pdf
INDIAN AIR FORCE FIGHTER PLANES LIST.pdfINDIAN AIR FORCE FIGHTER PLANES LIST.pdf
INDIAN AIR FORCE FIGHTER PLANES LIST.pdf
jackson110191
 
Running a Go App in Kubernetes: CPU Impacts
Running a Go App in Kubernetes: CPU ImpactsRunning a Go App in Kubernetes: CPU Impacts
Running a Go App in Kubernetes: CPU Impacts
ScyllaDB
 
HTTP Adaptive Streaming – Quo Vadis (2024)
HTTP Adaptive Streaming – Quo Vadis (2024)HTTP Adaptive Streaming – Quo Vadis (2024)
HTTP Adaptive Streaming – Quo Vadis (2024)
Alpen-Adria-Universität
 
Knowledge and Prompt Engineering Part 2 Focus on Prompt Design Approaches
Knowledge and Prompt Engineering Part 2 Focus on Prompt Design ApproachesKnowledge and Prompt Engineering Part 2 Focus on Prompt Design Approaches
Knowledge and Prompt Engineering Part 2 Focus on Prompt Design Approaches
Earley Information Science
 
Why do You Have to Redesign?_Redesign Challenge Day 1
Why do You Have to Redesign?_Redesign Challenge Day 1Why do You Have to Redesign?_Redesign Challenge Day 1
Why do You Have to Redesign?_Redesign Challenge Day 1
FellyciaHikmahwarani
 
Calgary MuleSoft Meetup APM and IDP .pptx
Calgary MuleSoft Meetup APM and IDP .pptxCalgary MuleSoft Meetup APM and IDP .pptx
Calgary MuleSoft Meetup APM and IDP .pptx
ishalveerrandhawa1
 
K2G - Insurtech Innovation EMEA Award 2024
K2G - Insurtech Innovation EMEA Award 2024K2G - Insurtech Innovation EMEA Award 2024
K2G - Insurtech Innovation EMEA Award 2024
The Digital Insurer
 
Recent Advancements in the NIST-JARVIS Infrastructure
Recent Advancements in the NIST-JARVIS InfrastructureRecent Advancements in the NIST-JARVIS Infrastructure
Recent Advancements in the NIST-JARVIS Infrastructure
KAMAL CHOUDHARY
 
How Social Media Hackers Help You to See Your Wife's Message.pdf
How Social Media Hackers Help You to See Your Wife's Message.pdfHow Social Media Hackers Help You to See Your Wife's Message.pdf
How Social Media Hackers Help You to See Your Wife's Message.pdf
HackersList
 
GDG Cloud Southlake #34: Neatsun Ziv: Automating Appsec
GDG Cloud Southlake #34: Neatsun Ziv: Automating AppsecGDG Cloud Southlake #34: Neatsun Ziv: Automating Appsec
GDG Cloud Southlake #34: Neatsun Ziv: Automating Appsec
James Anderson
 
[Talk] Moving Beyond Spaghetti Infrastructure [AOTB] 2024-07-04.pdf
[Talk] Moving Beyond Spaghetti Infrastructure [AOTB] 2024-07-04.pdf[Talk] Moving Beyond Spaghetti Infrastructure [AOTB] 2024-07-04.pdf
[Talk] Moving Beyond Spaghetti Infrastructure [AOTB] 2024-07-04.pdf
Kief Morris
 
Fluttercon 2024: Showing that you care about security - OpenSSF Scorecards fo...
Fluttercon 2024: Showing that you care about security - OpenSSF Scorecards fo...Fluttercon 2024: Showing that you care about security - OpenSSF Scorecards fo...
Fluttercon 2024: Showing that you care about security - OpenSSF Scorecards fo...
Chris Swan
 
Implementations of Fused Deposition Modeling in real world
Implementations of Fused Deposition Modeling  in real worldImplementations of Fused Deposition Modeling  in real world
Implementations of Fused Deposition Modeling in real world
Emerging Tech
 
20240705 QFM024 Irresponsible AI Reading List June 2024
20240705 QFM024 Irresponsible AI Reading List June 202420240705 QFM024 Irresponsible AI Reading List June 2024
20240705 QFM024 Irresponsible AI Reading List June 2024
Matthew Sinclair
 
Transcript: Details of description part II: Describing images in practice - T...
Transcript: Details of description part II: Describing images in practice - T...Transcript: Details of description part II: Describing images in practice - T...
Transcript: Details of description part II: Describing images in practice - T...
BookNet Canada
 
BLOCKCHAIN FOR DUMMIES: GUIDEBOOK FOR ALL
BLOCKCHAIN FOR DUMMIES: GUIDEBOOK FOR ALLBLOCKCHAIN FOR DUMMIES: GUIDEBOOK FOR ALL
BLOCKCHAIN FOR DUMMIES: GUIDEBOOK FOR ALL
Liveplex
 
Quantum Communications Q&A with Gemini LLM
Quantum Communications Q&A with Gemini LLMQuantum Communications Q&A with Gemini LLM
Quantum Communications Q&A with Gemini LLM
Vijayananda Mohire
 
Scaling Connections in PostgreSQL Postgres Bangalore(PGBLR) Meetup-2 - Mydbops
Scaling Connections in PostgreSQL Postgres Bangalore(PGBLR) Meetup-2 - MydbopsScaling Connections in PostgreSQL Postgres Bangalore(PGBLR) Meetup-2 - Mydbops
Scaling Connections in PostgreSQL Postgres Bangalore(PGBLR) Meetup-2 - Mydbops
Mydbops
 
7 Most Powerful Solar Storms in the History of Earth.pdf
7 Most Powerful Solar Storms in the History of Earth.pdf7 Most Powerful Solar Storms in the History of Earth.pdf
7 Most Powerful Solar Storms in the History of Earth.pdf
Enterprise Wired
 

Recently uploaded (20)

“Intel’s Approach to Operationalizing AI in the Manufacturing Sector,” a Pres...
“Intel’s Approach to Operationalizing AI in the Manufacturing Sector,” a Pres...“Intel’s Approach to Operationalizing AI in the Manufacturing Sector,” a Pres...
“Intel’s Approach to Operationalizing AI in the Manufacturing Sector,” a Pres...
 
INDIAN AIR FORCE FIGHTER PLANES LIST.pdf
INDIAN AIR FORCE FIGHTER PLANES LIST.pdfINDIAN AIR FORCE FIGHTER PLANES LIST.pdf
INDIAN AIR FORCE FIGHTER PLANES LIST.pdf
 
Running a Go App in Kubernetes: CPU Impacts
Running a Go App in Kubernetes: CPU ImpactsRunning a Go App in Kubernetes: CPU Impacts
Running a Go App in Kubernetes: CPU Impacts
 
HTTP Adaptive Streaming – Quo Vadis (2024)
HTTP Adaptive Streaming – Quo Vadis (2024)HTTP Adaptive Streaming – Quo Vadis (2024)
HTTP Adaptive Streaming – Quo Vadis (2024)
 
Knowledge and Prompt Engineering Part 2 Focus on Prompt Design Approaches
Knowledge and Prompt Engineering Part 2 Focus on Prompt Design ApproachesKnowledge and Prompt Engineering Part 2 Focus on Prompt Design Approaches
Knowledge and Prompt Engineering Part 2 Focus on Prompt Design Approaches
 
Why do You Have to Redesign?_Redesign Challenge Day 1
Why do You Have to Redesign?_Redesign Challenge Day 1Why do You Have to Redesign?_Redesign Challenge Day 1
Why do You Have to Redesign?_Redesign Challenge Day 1
 
Calgary MuleSoft Meetup APM and IDP .pptx
Calgary MuleSoft Meetup APM and IDP .pptxCalgary MuleSoft Meetup APM and IDP .pptx
Calgary MuleSoft Meetup APM and IDP .pptx
 
K2G - Insurtech Innovation EMEA Award 2024
K2G - Insurtech Innovation EMEA Award 2024K2G - Insurtech Innovation EMEA Award 2024
K2G - Insurtech Innovation EMEA Award 2024
 
Recent Advancements in the NIST-JARVIS Infrastructure
Recent Advancements in the NIST-JARVIS InfrastructureRecent Advancements in the NIST-JARVIS Infrastructure
Recent Advancements in the NIST-JARVIS Infrastructure
 
How Social Media Hackers Help You to See Your Wife's Message.pdf
How Social Media Hackers Help You to See Your Wife's Message.pdfHow Social Media Hackers Help You to See Your Wife's Message.pdf
How Social Media Hackers Help You to See Your Wife's Message.pdf
 
GDG Cloud Southlake #34: Neatsun Ziv: Automating Appsec
GDG Cloud Southlake #34: Neatsun Ziv: Automating AppsecGDG Cloud Southlake #34: Neatsun Ziv: Automating Appsec
GDG Cloud Southlake #34: Neatsun Ziv: Automating Appsec
 
[Talk] Moving Beyond Spaghetti Infrastructure [AOTB] 2024-07-04.pdf
[Talk] Moving Beyond Spaghetti Infrastructure [AOTB] 2024-07-04.pdf[Talk] Moving Beyond Spaghetti Infrastructure [AOTB] 2024-07-04.pdf
[Talk] Moving Beyond Spaghetti Infrastructure [AOTB] 2024-07-04.pdf
 
Fluttercon 2024: Showing that you care about security - OpenSSF Scorecards fo...
Fluttercon 2024: Showing that you care about security - OpenSSF Scorecards fo...Fluttercon 2024: Showing that you care about security - OpenSSF Scorecards fo...
Fluttercon 2024: Showing that you care about security - OpenSSF Scorecards fo...
 
Implementations of Fused Deposition Modeling in real world
Implementations of Fused Deposition Modeling  in real worldImplementations of Fused Deposition Modeling  in real world
Implementations of Fused Deposition Modeling in real world
 
20240705 QFM024 Irresponsible AI Reading List June 2024
20240705 QFM024 Irresponsible AI Reading List June 202420240705 QFM024 Irresponsible AI Reading List June 2024
20240705 QFM024 Irresponsible AI Reading List June 2024
 
Transcript: Details of description part II: Describing images in practice - T...
Transcript: Details of description part II: Describing images in practice - T...Transcript: Details of description part II: Describing images in practice - T...
Transcript: Details of description part II: Describing images in practice - T...
 
BLOCKCHAIN FOR DUMMIES: GUIDEBOOK FOR ALL
BLOCKCHAIN FOR DUMMIES: GUIDEBOOK FOR ALLBLOCKCHAIN FOR DUMMIES: GUIDEBOOK FOR ALL
BLOCKCHAIN FOR DUMMIES: GUIDEBOOK FOR ALL
 
Quantum Communications Q&A with Gemini LLM
Quantum Communications Q&A with Gemini LLMQuantum Communications Q&A with Gemini LLM
Quantum Communications Q&A with Gemini LLM
 
Scaling Connections in PostgreSQL Postgres Bangalore(PGBLR) Meetup-2 - Mydbops
Scaling Connections in PostgreSQL Postgres Bangalore(PGBLR) Meetup-2 - MydbopsScaling Connections in PostgreSQL Postgres Bangalore(PGBLR) Meetup-2 - Mydbops
Scaling Connections in PostgreSQL Postgres Bangalore(PGBLR) Meetup-2 - Mydbops
 
7 Most Powerful Solar Storms in the History of Earth.pdf
7 Most Powerful Solar Storms in the History of Earth.pdf7 Most Powerful Solar Storms in the History of Earth.pdf
7 Most Powerful Solar Storms in the History of Earth.pdf
 

Open Source SQL Databases

  • 4. Who and what is about? • Emanuel Calvo, currently at OnGres as a PostgreSQL Consultant and ayres.io as _root_. • Working on Modern techniques for DBRE. • What is the current status of the Open Source SQL databases per component? • What’s the good, the bad and the ugly in the market?
  • 5. ER model Entity-Relationship and why SQL isn’t considered so.
 At least in its pure state.
  • 6. The ER Map • Needs a First-Order logic language for retrieving data. • Relational Algebra • Tuple and Domain Relational Calculus.
  • 7. The model example • Obscures everything behind the complexity of the storage. • It is represented as relational algebra, but is hidden from you. • How to select the names of the people of "Black" team?
  • 8. Some SQL:2011 tangent distinctions • Support NULLs • Support SubQueries • Column precedence affects (horizontal alignment) depending on the engine • SQL/MED • Is a declarative language • Hides all the complexity of the executions to the end user • Planners were very advanced already.
  • 10. The Entity Consistency • CAP Theorem (Consistency, Availability and Partition Tolerance). PACELC adds to choose between [L]atency and [C]onsistency. • ACID (Atomicity, Consistency, Isolation and Durability) • BASE (Basically Available, Soft State, Eventual consistency)
  • 11. The chosen We grab them by the storage and use them 
 wisely without paying money to Oracle.
  • 12. • CockroachDB • PostgreSQL • MySQL / MariaDB • Clickhouse • MongoDB
  • 14. • Storage Engine • Planner • Protocol • Language • Ecosystem • Framework • WAL • Transaction Manager • Source Code availability, documentation both user and internal, community, etc. • Buffer Management • IO method (Direct/io, fsync) • Transaction Management (storage layer) • Point in Time Recovery and Undo Log • For distributed engines you want to 
 read Jepsen tests. • Is the sauce
  • 15. Wide-range Storage Engine Map Columnar Based Tuple Based Leveled Structured 
 Map Tree
  • 16. Quick cherry pick • Fast for aggregations • Easy for parallelization • Better compression due to ColBased • Better to scale massive amount of data • Bloom filters • Sparse indexes by design • Avoid Write Amplification • Index-based storage • More disk efficient, more CPU • Better for concurrency • Hard to scale • Better when manipulating entities
 atomically • Balance between performance and
 concurrency.
  • 17. – Jorge de Lanús Oeste (maneja Uber pero sabe mucho de Bases de Datos) “Relational databases require a Query Optimizer/ Query Planner for translating the first-order logic language to relational algebra and other optimizations. The result is called Execution Plan.”
  • 18. • Storage Engine • Planner • Protocol • Language • Ecosystem • Framework • WAL • Transaction Manager • Source Code availability, documentation both user and internal, community, etc. Plan: {
 … • Heuristic • Cost based {Parametric, MO, MOP} • Mixed • Planner, Resolver, Opmitizer, Executor
  • 19. • Storage Engine • Planner • Protocol • Language • Ecosystem • Framework • WAL • Transaction Manager • Source Code availability, documentation both user and internal, community, etc. Plan: {
 … • Heuristic • Cost based {Parametric, MO, MOP} • Mixed • Planner, Resolver, Opmitizer, Executor • MySQL has also Condition Pushdown • PostgreSQL has a rich planner • MySQL plan information lacks of 
 information • PostgreSQL does not provide additional
 tools for plan reading.
  • 20. • Storage Engine • Planner • Protocol • Language • Ecosystem • Framework • WAL • Transaction Manager • Source Code availability, documentation both user and internal, community, etc. • Client Protocol • Replication Protocol • Logical/Binary • Coordination Protocol • HA protocol • Gossip • Consensus {RAFT, Paxos} • …
  • 21. • Storage Engine • Planner • Protocol • Language • Ecosystem • Framework • WAL • Source Code availability, documentation both user and internal, community, etc. • Client Protocol • Replication Protocol • Coordination Protocol • HA protocol • Gossip • Consensus {RAFT, paxos} • … • No standard • JSON is becoming more present
 (thankfully) • Absence of internal consensus
  • 22. • Storage Engine • Planner • Protocol • Language • Ecosystem • Framework • WAL • Transaction Manager • Source Code availability, documentation both user and internal, community, etc. • Abstract all relation algebra • SQL != Relational • NULLs • Column Alignment • Subquery • Mixed implementations • Relational is conceptually unable to
 return more than 1 result set.
  • 23. • Storage Engine • Planner • Protocol • Language • Ecosystem • Framework • WAL • Transaction Manager • Source Code availability, documentation both user and internal, community, etc. • Abstract all relation algebra • SQL != Relational • NULLs • Column Alignment • Subquery • Mixed implementations • Relational is conceptually unable to
 return more than 1 result set. • Standard • Backward Compatibility • Modern What do we want?
  • 24. Postgres95 -> PostgreSQL “Postgres original implementation was in QUEL and its organization resembles to many of the concepts of the original ER model. COPY is a inherited piece from this prior implementation.”
  • 25. • Storage Engine • Planner • Protocol • Language • Ecosystem • Framework • WAL • Transaction Manager • Source Code availability, documentation both user and internal, community, etc. • Single Provider or fake Open source • Community contribution or 
 Social Entropy Experiment • Satellite companies building tools • Satellite companies building forks • Satellite coders copy pasting • Tons of under-proven libraries
  • 26. • Storage Engine • Planner • Protocol • Language • Ecosystem • Framework • WAL • Transaction Manager • Source Code availability, documentation both user and internal, community, etc. • Multi-database tools tend to fail 
 awesomely • Choose tools that are integrated with
 the core and that have frequent updates • Bug fixing tied to community times • bugs.mysql.com • Postgres uses mailing list • Clickhouse/Cockroach use GH
  • 27. • Storage Engine • Planner • Protocol • Language • Ecosystem • Framework • WAL • Transaction Manager • Source Code availability, documentation both user and internal, community, etc. • Core extensibility plugins or extensions • Customize Planner • Manage protocol • Creating workers • Creating own types
  • 28. • Storage Engine • Planner • Protocol • Language • Ecosystem • Framework • WAL • Transaction Manager • Source Code availability, documentation both user and internal, community, etc. • Complex, generally in C. • Multi-provider packages.
  • 29. • Storage Engine • Planner • Protocol • Language • Ecosystem • Framework • WAL • Transaction Manager • Source Code availability, documentation both user and internal, community, etc. • WAL or Redo • MySQL has undo log, but only for
 rollback space. • Postgres has extensions for rewind
 (pg_rewind) • It can reside on the Storage Engine
 or higher layers • It’s local and provides consistency and
 durability • Distributed WALs or Certification log could
 be in this group, although there will be 
 always a WAL.
  • 30. • Storage Engine • Planner • Protocol • Language • Ecosystem • Framework • WAL • Transaction Manager • Source Code availability, documentation both user and internal, community, etc. • It can be at node level or cluster level • Concept of source and origin • Group Replication • Logical Replication • Concept of Global Id • Centralized Commits are possible through 
 Kafka brokers • Functional sharing must relay on node
 try level • Serializable only supported by Postgres • Uncommitted only supported by InnoDB
  • 31. Other components or capabilities • Access Methods (B-Tree, L-Tree, Reverse, Hash) • FTS (Full Text Search) and advanced search • Geo capabilities
  • 33. What is in the land of single leader engines? • Async • Semi-synchronous replication • First node response, as in MySQL. • Simple Synchronous replication • Quorum Synchronous • Postgres
  • 34. What is in the land of distributed/ multileader[less] engines? • Asynchronous Multi Leader replication • BDR • Snapshot Isolation • Galera (MySQL layer on top InnoDB) • Serializability • CockroachDB (2PC to a consensus group, with Hybrid Logical Clock, not strict serial) • VoltDB • External consistency • Google Spanner (through True Time clocks).
  • 35. The [full] architecture Service Check HTTP Replication Worker /
 Certification / Tx Coordination Client Worker Internal Pooling / Thread Management / Process per worker External Pooling Executor • Write Quorum • Single Leader • Multi Leader • Group Replication • Inter node coordination • Distributed Transactions • Conflict-Free Replicated
 Datatype (LWW, 2PC set, 
 etc) • Consensus for HA • Also in the entry points if
 external • Centralized Commit
  • 36. The status of horizontal scalability in OSDBs • Non native support for distributed consensus. • Only MySQL has Global identifiers and recently supported Group Replication. • There are extensions/forks for providing sharding in Postgres and MySQL.
  • 38. References • Designing Data-Intensive Applications (Martin Kleppmann) • Database Reliability Engineering (L. Campbell/C. Majors)