[D3T1S02] Aurora Limitless Database Introduction

© 2024, Amazon Web Services, Inc. or its affiliates. All rights reserved.
AWS DATA & AI ROADSHOW 2024
Jihoon Kim
Database Specialist Solutions Architect
AWS
Aurora Limitless Database
Introduction

Common scaling dimensions
Object size Object count Query volume
Time
QPS

Sharding
Application
A-Z
A-F G-K L-P Q-U V-Z

Sharding brings … challenges
Application
Consistency?
Querying? Maintenance?
?

Amazon Aurora Limitless Database
Application
Scaling Managed
Lim
ited
Preview

Amazon Aurora Limitless Database
Single interface
Transactionally
Consistent
Millions of
transactions
Distributed
Serverless
Petabytes of
Data
Scaling Managed

Using Limitless Database
cust_id
name
email
order_id
cust_id
amount
tax_rate_id
tax_rate_id
city
state
country
tax_rate
Order Tax Rate
Customer

Using Limitless Database
Order Tax Rate
Customer
Shard 1 Shard 2 Shard 3
Tax Rate
Customer Customer
Tax Rate
Order
Order
Tax Rate
Customer
Order
Sharded Reference
collocated

Create sharded customer table
SET rds_aurora.limitless_create_table_mode='sharded';
SET rds_aurora.limitless_create_table_shard_key='{“cust_id"}';
CREATE TABLE customer (
cust_id INT PRIMARY KEY NOT NULL,
name TEXT,
email VARCHAR(100)
);

Create co-located order table
SET rds_aurora.limitless_create_table_mode='sharded';
SET rds_aurora.limitless_create_table_shard_key='{“cust_id"}';
SET rds_aurora.limitless_create_table_collocate_with='customer';
CREATE TABLE order (
order_id INT NOT NULL,
cust_id INT NOT NULL,
amount DOUBLE NOT NULL,
tax_rate_id DOUBLE,
PRIMARY KEY (order_id, cust_id)
);

Create reference table tax_rate
SET rds_aurora.limitless_create_table_mode='reference';
CREATE TABLE tax_rate (
tax_rate_id INT PRIMARY KEY NOT NULL,
city TEXT NOT NULL,
state TEXT,
country TEXT NOT NULL,
tax_rate DOUBLE NOT NULL
);
SET rds_aurora.limitless_create_table_mode='standard';

Architecture overview

Standard Aurora architecture
Aurora cluster
Aurora distributed storage
Reader instances
Writer instance
1
2 3
1. Aurora volume on distributed
storage
2. An Aurora writer instance
3. Optional reader instances for
availability and read scaling
4. Limitless Database introduces
the “shard group” concept
Limitless Database shard group
4

Aurora cluster
Data access shards
Distributed transaction routers
primary writer
Contained within your Aurora cluster
Encapsulates Limitless Database
infrastructure for your cluster
Provides an endpoint for applications
Scales resources within configured
limit based on load and data size

Aurora cluster
Data access shards
primary writer
Serve all application traffic
Scale vertically and horizontally
based on load
Know schema and key range
placement
Assign time for transaction snapshot
and drive distributed commits
Perform initial planning of query and
aggregate results from multi-shard
queries

Data access shards
Aurora cluster
Data access shards
primary writer
Own portion of sharded table key
space and have full copies of
reference tables
Scale vertically and split based on
load
Perform local planning and execution
of query fragments
Execute local transaction logic
Backed by Aurora distributed storage

Creating a shard group
aws rds create-db-shard-group
--db-cluster-identifier proddb
--db-shard-group-identifier proddb-sg
--max-acu 600
--compute-redundancy 2
Total scale of all routers and shards will be <= max-acu

Compute redundancy
Aurora Cluster
Availability Zone 1 Availability Zone 2 Availability Zone 3
Shard Group
Distributed Transaction Routers
Data Access Shards
S1 S2 S3
S3 S1 S2
S2 S3 S1
Compute redundancy 0
Separately configure HA
for the primary writer
PW

Data distribution

Sharded tables
set rds_aurora.limitless_create_table_mode='sharded';
set rds_aurora.limitless_create_table_shard_key='{bid}’;
create table pgbench_branches(
bid int not null,
bbalance int,
filler char(88));
postgres_limitless=> d+ pgbench_branches
Partitioned table "public.pgbench_branches"
Column | Type | Collation | Nullable | Default | Storage | Compression | Stats target |
Description
----------+---------------+-----------+----------+---------+----------+-------------+--------------+--------
-----
bid | integer | | not null | | plain | | |
bbalance | integer | | | | plain | | |
filler | character(88) | | | | extended | | |
Partition key: HASH (bid)
Partitions: pgbench_branches_fs00001 FOR VALUES FROM (MINVALUE) TO ('-4611686018427387904'),
pgbench_branches_fs00002 FOR VALUES FROM ('-4611686018427387904') TO ('0'),
pgbench_branches_fs00003 FOR VALUES FROM ('0') TO ('4611686018427387904'),
pgbench_branches_fs00004 FOR VALUES FROM ('4611686018427387904') TO (MAXVALUE)

Hash-range partitioning
Shard key is hashed to 64-bits
Ranges of 64-bit space are
assigned to shards
Shards own table fragments
Routers have table fragment
references, but no data
pgbench_branches fragments
pgbench_branches
MINVALUE
-4611686018427387904
-4611686018427387904
0
0
4611686018427387904
4611686018427387904
MAXVALUE
Data access shards

Table slicing
Table fragments are partitioned
into sub-range slices
Not directly visible to users
Improve intra-shard parallelism
Relocate on horizontal scale out

Horizontal scale out
“Shard split” occurs due to
utilization or storage size
Collocated table slices moved
together
Leverages Aurora storage level
cloning and replication
Routers can be added or removed
accounts and branches fragments
fragment references
MINVALUE
-4611686018427387904
-4611686018427387904
0
0
4611686018427387904
4611686018427387904
MAXVALUE
Data access shards
4611686018427387904
9223372036854775808
9223372036854775808
MAXVALUE

Reference tables
set rds_aurora.limitless_create_table_mode='reference’;
create table pgbench_rates(
pid int not null primary key,
term int,
rate numeric not null);
pgbench_rates
pgbench_rates
pgbench_rates
pgbench_rates
pgbench_rates
Data access shards
Strongly consistent (ACID writes)
Enables join pushdown
Frequent read/join, infrequent write

Transactions

Transaction support
Support for READ COMMITED and REPEATABLE READ
…with a consistent view as in a single system

Isolation level refresher
READ COMMITTED
See the latest committed data
before your query began
Every query in a transaction could
see different data
REPEATABLE READ
See the latest committed data
before your transaction began
Every query in a transaction sees
the same data
Design goal: Retain PostgreSQL transaction semantics

Challenges in a distributed database
Coordination limits
scalability
Query fragments execute
at different times
Transaction scope
unknown until commit
Maintain order Consistent restores

Solved with bounded clocks
Based on EC2 TimeSync service
v Current time (approximate)
v Earliest possible time
v Latest possible time
Integrated into PostgreSQL
v Tuple visibility based on time of
snapshot and commit
v Global read-after-write
v One-phase & two-phase commit

Repeatable read – distributed (with clocks)
Transaction T1
BEGIN TRANSACTION ISOLATION LEVEL
REPEATABLE READ;
SELECT abalance FROM pgbench_accounts
WHERE bid = 619 and aid = 61890340;
704
1
Transaction T2
BEGIN;
WHERE bid = 801 and aid = 80044011 FOR
UPDATE;
1
UPDATE pgbench_accounts SET abalance =
1001 WHERE bid = 801 and aid = 80044011;
COMMIT;
Transaction T3
1001
1) router gets time t100
2) execute on shard w/bid
619 using snapshot@t100
1) router uses 1PC on shard
2) shard assigns commit@t110
3) acks commit when
a) writes durable on disk
b) earliest possible time > t110

Read committed – distributed
Transaction T1
SELECT SUM(abalance) FROM
pgbench_accounts;
10000000
Transaction T3
SELECT SUM(abalance) FROM
pgbench_accounts;
10000000
Transaction T2
BEGIN;
abalance – 500 WHERE bid = 619 and aid =
61890340;
abalance + 500 WHERE bid = 801 and aid =
80044011;
COMMIT;
2) executes sum() on each shard
using snapshot@t100
3) aggregates the result
1) router determines 2PC, asks
shards to prepare
2) shard w/619 prepares@t118
shard w/801 prepares@t112
3) router assigns commit@t120
4) acks commit when
a) writes durable on disk
b) earliest possible time > t120
5) router tells shards to
commit@t120
2) executes sum() on each shard
using snapshot@t116
3) aggregates the result

Transactions conclusion
Same RC/RR semantics as PostgreSQL
All reads are consistent, w/o quorum, even on failover
Commits w/single shard writes scale linearly (millions/sec)
Distributed commits are atomic

Queries & SQL compatibility

Fundamentally Aurora PostgreSQL
PostgreSQL parser and semantics
Broad surface area coverage Selected extensions
PostgreSQL wire compatible

Query execution basics
PostgreSQL foreign tables
foundation
Enhancements in core engine
A custom foreign data wrapper

Query flow
Router
1. Receives query from client
2. Plans what can be sent to shards
and any joins that must be done
3. Sends partial queries to shards with
transaction context
7. Router does final joins, filters, and
aggregations as necessary
Shard
4. Receives partial query from router
5. Plans local joins and scans
6. Execute and sent results to router

Single shard queries
Best performance when router determines query goes to a single shard

Parallel operations
Parallel operations speed up on multi-shard
Some examples:
Create index
Analyze
Vacuum
Aggregates (sum, min, max, etc.)

Thank you!
Please complete the session
survey in the mobile app
Thank you!

[D3T1S02] Aurora Limitless Database Introduction

Related slideshows

More Related Content

Similar to [D3T1S02] Aurora Limitless Database Introduction

Similar to [D3T1S02] Aurora Limitless Database Introduction (20)

More from Amazon Web Services Korea

More from Amazon Web Services Korea (20)

Recently uploaded

Recently uploaded (20)

[D3T1S02] Aurora Limitless Database Introduction