Postgres-XC Write Scalable PostgreSQL Cluster

Postgres-XC: Write-Scalable
PostgreSQL Cluster

Mason Sharp

August 7th, 2012

CC License: Attribution-NonCommercial-ShareAlike

Content Attribution
• Koichi Suzuki
• Michael Paquier
• Ashutosh Bapat
• Pavan Deolasee
• Mason Sharp
• ...?

Aug 7, 2012 2

Who am I
● Mason Sharp
● Co-organizer of NYC PUG
● Co-founder of StormDB
● Previously worked at EnterpriseDB
● Original architect of Stado (GridSQL)
● One of the original architects of Postgres-XC

Aug 7, 2012 Postgres-XC 3

PostgreSQL User Groups

San Francisco New York
616 Members 502 Members

New:
Philadelphia
Los Angeles
Tokyo
2000? Members


NYC PUG Meetup Membership


NYC PUG Speakers
● Recent speakers include
● Bruce Momjian
● Greg Smith
● Greg Stark
● Joe Conway
● Joachim Wieland


NYC PUG Speakers
We want you!


Postges-XC Talk
● Background
● Postgres-XC Introduction & Usage
● Postgres-XC Components
● Postgres-XC Details

8

Background


Data Tier Scaling
● Up versus Out
● More memory, more cores
● Read-only Replicated Slaves
● Caching
● Memcached
● Sharding
● NoSQL
● NewSQL


XC Origins

Koichi Suzuki, NTT Data Mason Sharp


PostgreSQL-Related Clustering
Projects
● pgpool-II
● Read replicated slaves
● PL/Proxy
● Used by Skype, meetme (myYearbook)
● All access is over a stored function
● Postgres-R, PostgresForest
● Stado (GridSQL)
● Parallel Query Can we make it write scalable?
● Not write-scalable


Postgres-XC Introduction


Overview
● PostgreSQL-based database cluster
● Same API to Apps as PostgreSQL
– Same drivers
● Currently based upon PG 9.1. Soon: 9.2.
● Symmetric Multi-headed Cluster
● No master, no slave
– Not just PostgreSQL replication.
– Application can read/write to any coordinator server
● Consistent database view to all the transactions
– Complete ACID property to all the transactions in the cluster
● Scales both for Write and Read


Postgres-XC Cluster
Application can connect to any server to have the same database view and service
.

PG- XC Server PG- XC Server PG- XC Server PG- XC Server

Coordinator Coordinator Coordinator ・・・
・・ Coordinator

Data Node Data Node Data Node Add PG- XC servers as Data Node
needed

Communication among PG- XC servers

Global Transaction
Manager
GTM


Read/Write Scalability
DBT-1 throughput scalability


I
Consistency

Is XC right for you?
● I need write scalability
● I like ACID
● I like SQL
● I don't want to rewrite my existing SQL
applications
● I want to leverage the PostgreSQL community
for all of their contrib modules


Why XC may not be right for you
● I need MPP parallel query capability
● Parallel Query in XC Limited
● Try Stado: www.stado.us

● I need a solution with built-in HA
● I need massive scale and have loose
consistency requirements
● I would rather use a NoSQL solution so I can
put it on my resume


Postgres-XC Components


Coordinator Overview
●
Based on PostgreSQL 9.1 (9.2 soon)
●
Accepts connections from clients
●
Parses and plans requests
●
Interacts with Global Transaction Manager
●
Uses pooler for Data Node connections
●
Sends down XIDs and snapshots to Data
Nodes
●
Collects results and returns to client
●
Uses two phase commit if necessary

22

Data Node Overview
●
Based on PostgreSQL 9.1 (9.2 soon)
●
Where user created data is actually
stored
●
Coordinators (not clients) connects to
Data Nodes
●
Accepts XID and snapshots from
Coordinator
●
The rest is fairly similar to vanilla
PostgreSQL

23

Global Transaction Manager

GTM Cluster nodes

XID
Snapshot
Timestamp
Sequence values


Summary
● Coordinator
● Visible to apps Postgres-XC core, based upon
vanilla PostgreSQL
● SQL analysis, planning, execution
● Connection pooling Share same binary
● Datanode (or simply “NODE”) May want to colocate
● Actual database store
● Local SQL execution
● GTM (Global Transaction Manager)
● Provides consistent database view to transactions
– GXID (Global Transaction ID)
– Snapshot (List of active transactions) Different binaries
– Other global values such as SEQUENCE
● GTM Proxy, integrates server-local transaction requirement for performance


Data Distribution

Distribution Strategies


Distributing the data
● Replicated table
● Each row in the table is replicated to the datanodes
● Statement based replication
● Distributed table
● Each row of the table is stored on one datanode,
decided by one of following strategies
– Hash
– Round Robin
– Modulo
– Range and user defined function (future)


Table Distribution and Replication

● Each table can be distributed or replicated
● Strategy based on usage
– Transaction tables → Distributed
– Static lookup tables → Replicate
– Distribute parent-children together
● Join pushdown when possible
● Where clause pushdown
● Simple parallel aggregates


Defining Tables
● Table Distribution/Replication
● CREATE TABLE tab (…) DISTRIBUTE BY
HASH(col) | MODULO(col) | ROUND
ROBIN | REPLICATION


Replicated Tables
Reads
Writes

read
write write write

val val2 val val2 val val2
val val2 val val2 val val2
1 2 1 2 1 2
1 2 1 2 1 2
2 10 2 10 2 10
2 10 2 10 2 10
3 4 3 4 3 4
3 4 3 4 3 4


Distributed Tables
Write Read

Combiner

write
read read read

val val2 val val2 val val2 val val2 val val2
val val2

1 2 11 21 10 20
1 2 11 21 10 20

2 10 21 101 20 100 2 10 20 100
21 101
3 4 31 41 30 40 3 4 31 41 30 40


Join Pushdown
Hash/Module Round Robin Replicated
distributed

Hash/Modulo Inner join with NO Inner join if replicated
distributed equality condition on table's distribution list
the distribution is superset of
column with same distributed table's
data type and same distribution list
distribution strategy
Round Robin No No Inner join if replicated
table's distribution list
is superset of
distributed table's
distribution list
Replicated Inner join if replicated Inner join if replicated All kinds of joins
table's distribution list table's distribution list
is superset of is superset of
distributed table's distributed table's
distribution list distribution list

Constraints
● XC does not support Global constraints – i.e.
constraints across datanodes
● Constraints within a datanode are supported
Distribution strategy Unique, primary key Foreign key constraints
constraints

Replicated Supported Supported if the referenced
table is also replicated on
the same nodes
Hash/Modulo distributed Supported if primary OR Supported if the referenced
unique key is distribution key table is replicated on same
nodes OR it's distributed by
primary key in the same
manner and same nodes
Round Robin Not supported Supported if the referenced
table is replicated on same
nodes


Demo


Transaction Management

Why MVCC is Important for Consistency
Global Transaction Manger


Multi-version Concurrency Control
(MVCC) (quick overview)
● Readers do not block writers
● Writers do not block readers
● Transaction Ids (XIDs)
● Every transaction gets an ID
● Snapshots contain a list of running XIDs


(MVCC) (quickly discussed)
Example:
T1 Begin...
T2 Begin; INSERT...; Commit
T3 Begin...
T4 Begin; SELECT

● T4's snapshot contains T1 and T3
● T2 already committed
● It can see T2's commits, but not T1's nor T3's


(MVCC) on 2 Independent Nodes
Example:
T1 Begin...
T2 Begin; INSERT..; Commit;
T3 Begin...
T4 Begin; SELECT

● Node 1: T2 Commit, T4 SELECT
● Node 2: T4 SELECT, T2 Commit
● T4's SELECT statement returns inconsistent data
● Includes data from Node1, but not Node2.
● C in ACID Fails


Global Transaction Manager
(GTM)
● Provides Global Transaction Consistency

GTM Cluster nodes

XID
Snapshot
Timestamp
Sequence values


Transaction Management
● 2PC is used to guarantee transactional consistency
across nodes
● When there are more than one nodes involved OR
● When there are explicit 2PC transactions
● Only those nodes where write activity has happened,
participate in 2PC
● In PostgreSQL 2PC can not be applied if temporary
tables are involved. Same restriction applies in
Postgres-XC
● When single coordinator command needs multiple
datanode commands, we encase those in transaction
block

Postgres-XC Considerations


Can GTM be a Performance Bottleneck?
• Depending on implementation
– Current Implementation Coordinators
GTM

GTM Threads Coordinator Backend
Snapshot Data

Domain Socket

Applicable up to

Client Library

Coordinator
Internet

Lock five PG-XC

Call
servers (DBT-1)

Create Terminate

GTM Main Thread

– Large snapshot size and number
– Too many interaction between GTM and Coordinators

July 12th, 2012 42

Can GTM be a Performance Bottleneck?
Proxy Implementation Coordinators

GTM

GTM Worker Threads GTM Proxy Thread Coordinator Backend
Snapshot Data

GTM Snapshot Handler

GTM Server Scanner

Server Protocol Handler

Command
Backend

Handler

Client Library
Internet

Coordinator
Domain
Socket

Domain
Socket
Call

Unix
Lock

Call
Response
Backend

Handler
Create Terminate Create Connection
Terminate Assignment

GTM Main Thread Proxy Main Thread
Connection

•Request/Response grouping
•Single representative snapshot applied to multiple transactions

July 12th, 2012 43

Can GTM be a SPOF?
• Implement GTM Standby

Checkpoint next starting
point (GXID and Sequence)

GTM Master GTM Standby

Standby can failover the
master without referring to
GTM master information.

July 12th, 2012 44

Parallel Query
● OK for simple queries
● Also when all joins can be pushed down
– Star schema with replicated dimensions
● Even aggregates
● SELECT SUM(col1) FROM tab1
● If cross-node join needed performs poorly
● Data on one node needs to join with another
● Ships all data to coordinator for joining


High Availability
● GTM-standby provides basic HA
● No native HA for nodes
● Use HA middleware such as Pacemaker
● Each data node should be configured with
synchronous replication


Status

Settings and options


Present Status
● Project/Developer site
● http://postgres-xc.sourceforge.net/
● http://sourceforge.net/projects/postgres-xc/
● Version 1.0 available
● Base PostgreSQL version: 9.1
● Soon, PostgreSQL 9.2!
– Group commit: even more write scalability
– “Index-only Scans”
● Get Involved
● Even as just a tester

Easy way of trying it out?
● www.stormdb.com
● Not Postgres-XC, but similar
● Nothing to install, cloud hosted
● Free beta


Thank You

mason@stormdb.com
Twitter: mason_db


Postgres-XC Write Scalable PostgreSQL Cluster

Related slideshows

More Related Content

What's hot

What's hot (20)

Viewers also liked

Viewers also liked (20)

Similar to Postgres-XC Write Scalable PostgreSQL Cluster

Similar to Postgres-XC Write Scalable PostgreSQL Cluster (20)

Recently uploaded

Recently uploaded (20)

Postgres-XC Write Scalable PostgreSQL Cluster