SlideShare a Scribd company logo
MariaDB ColumnStore
Understanding the Architecture
Andrew Hutchings (LinuxJedi)
Lead Software Engineer,
MariaDB ColumnStore
Who Am I?
● Andrew Hutchings, aka “LinuxJedi”
● Lead Software Engineer for MariaDB’s ColumnStore
● Previous worked for:
○ NGINX - Senior Developer Advocate / Technical
Product Manager
○ HP - Principal Software Engineer (HP Cloud / ATG)
○ SkySQL - Senior Sustaining Engineer
○ Rackspace - Senior Software Engineer
○ Sun/Oracle - MySQL Senior Support Engineer
● Co-author of MySQL 5.1 Plugin Development
● IRC/Twitter: LinuxJedi
● EMail: linuxjedi@mariadb.com
Overview
● History of MariaDB ColumnStore
● Technical Use Case
● Components of MariaDB ColumnStore
● Disk Storage
● Writing Data
● Querying Data
● Optimizing for MariaDB ColumnStore
● Closing Notes
● Questions
History of MariaDB ColumnStore
● March 2010 - Calpont launches InfiniDB
● September 2014 - Calpont (now itself called InfiniDB) closes down
○ MariaDB (then SkySQL) supports InfiniDB customers
● April 2016 - MariaDB announces development of MariaDB ColumnStore
● August 2016 - I joined MariaDB and jumped straight into ColumnStore
● December 2016 - MariaDB ColumnStore 1.0 GA
○ InfiniDB + MariaDB 10.1 + Many fixes and improvements
● November 2017 - MariaDB ColumnStore 1.1 GA
○ MariaDB 10.2 + APIs + Even more improvements
Technical Use Case
Technical Use Case
MariaDB ColumnStore
● Very large data sets
○ Many columns
○ Many millions of rows
● Complex joins and aggregates
● Rapid bulk data insertion
○ The larger the batch the better
Traditional OLTP Engines
● Smaller data sets
● Basic queries
● Lots of DML queries
● Complex data types
Data Types
● INT types - range is 2 less from max unsigned or min signed
● CHAR†
- max 255 bytes
● VARCHAR†
- max 8000 bytes
● DECIMAL - max 18 digits
● DOUBLE/FLOAT
● DATETIME - no sub-seconds (coming in 1.2)
● DATE
● BLOB/TEXT†
† Empty string is the same as NULL
Other DDL Differences
● No indexes
○ Columns are somewhat self-indexing
● Auto increment is handled differently (a table comment)
● No constraints
● PARTITION syntax not supported
○ Columns are partitioned automatically
Row-oriented vs. Column-oriented Format
ID Fname Lname State Zip Phone Age Sex
1 Bugs Bunny NY 11217 (718) 938-3235 34 M
2 Yosemite Sam CA 95389 (209) 375-6572 52 M
3 Daffy Duck NY 10013 (212) 227-1810 35 M
4 Elmer Fudd ME 04578 (207) 882-7323 43 M
5 Witch Hazel MA 01970 (978) 744-0991 57 F
ID
1
2
3
4
5
Fname
Bugs
Yosemite
Daffy
Elmer
Witch
Lname
Bunny
Sam
Duck
Fudd
Hazel
State
NY
CA
NY
ME
MA
Zip
11217
95389
10013
04578
01970
Phone
(718) 938-3235
(209) 375-6572
(212) 227-1810
(207) 882-7323
(978) 744-0991
Age
34
52
35
43
57
Sex
M
M
M
M
F
SELECT Fname FROM People WHERE State = 'NY'
Components of MariaDB ColumnStore
ColumnStore Modules
● User Module (UM)
○ MariaDB Server / Storage Engine Plugin
○ ExeMgr
○ DMLProc, DDLProc
● Performance Module (PM)
○ PrimProc
○ WriteEngine
○ ProcMgr / ProcMon
○ DBRM
Query Processing
Shared Nothing Distributed Data Storage
SQL
Column
Primitives
User
Module
Performance
Module
UM
PM
Primitives ↓↓↓↓
Intermediate
↑↑Results↑↑
Hardware Requirements
● Lots of RAM
○ minimum 32GB for UM, 16GB for PM
○ minimum 4GB for trying single server out on a VM
● Optimised for HDD spindles, will still work with SSD
○ We are looking into SSD optimisation soon
● More cores typically better
○ 8 core minimum recommendation
● For AWS m4.4xlarge is the recommended minimum
Disk Storage
Column Types
• 8-byte fixed length token (pointer).
• A variable length value stored at the
location identified by the pointer.
1-byte Field
with 8192
values per 8k
block
2-byte Field
with 4096
values per 8k
block
4-byte Field
with 2048
values per 8k
block
8-byte Field
with 1024
values per 8k
block
Dictionary structure
made up of 2
files/extents with:
Extent Map
Object ID The ID for the column (or dictionary)
Object Type Column or Dictionary
LBID Start / End Start / End Logical Block Pointer
Minimum Value Lowest value in the extent
Maximum Value Highest value in the extent
Width Column Width
DBRoot DBRoot (disk partition) number
Partition ID / Segment ID / Block Offset The extent number
High Water Mark Atomic last block pointer
Disk Storage
Blocks (8KB)
Extent1
(8MB~64MB
8 million rows)
Logical
Layer
Segment File1
(maps to an Extent)
Physical
Layer
Compression
Chunks
Writing Data
Inserting Data
● Multiple methods
○ Single INSERTs
○ INSERT...SELECT
○ LOAD DATA INFILE
○ cpimport
○ Bulk Write API
● Designed for large bulk inserts
● Inserts are appended at the end of extents (or new extents created)
○ This means reads are not affected
○ A High Water Mark pointing to the last block is moved at the end of the insert
cpimport
● Uses CSV files or piped CSV data
● Fastest way to get data into ColumnStore
● Does minimal data conversion and pipes it straight into the PMs
○ Works by appending new blocks to the table and moving an atomic block pointer (HWM)
○ No UNDO log needed (atomic pointer not moved on rollback)
○ Therefore can cause a gap of 0-64KB in a column
● Can load multiple tables simultaneously
● Can load into multiple PMs for the same table simultaneously
● Can load into specific PMs for physical partitioning by PM
Bulk Write API
● A simple C++ API to inject data into the PMs
○ Bindings in Python and Java available
● Works in a similar way to cpimport
○ Append new blocks and an atomic block pointer (HWM)
● LGPL licensed
DML Writes
● Regular INSERT / UPDATE / DELETE
○ Also INSERT...SELECT and LOAD DATA INFILE when autocommit is off
● Slow compared to other engines
○ INSERT is very slow compared to cpimport
● Requires the use of a version buffer for an undo log
○ But INSERT appends to data blocks so no wasted storage
● Data sent to DMLProc to process
A Note About DELETE
● Need to touch every column and the undo log
○ So very slow
● Also leaves a gap in the column that won’t be filled
● Having a column that is marked using an UPDATE query is faster
● Dropping entire partitions is instantaneous
○ Partitions can be disabled first
INSERT...SELECT / LOAD DATA INFILE
● Injects the binary row data from MariaDB into cpimport
● Good for backwards compatibility with tools and remote loading
● cpimport then injects this data into the column extent files
○ In 1.2 it will use the write API instead
● If autocommit is turned off this will behave like regular DML instead (slow)
Querying Data
Physical Execution Layout
Round Robin
MariaDB
Client
MariaDB
Server
ExeMgr
ExeMgr
PrimProc
PrimProc
PrimProc
PrimProc
Extent Elimination
Horizontal
Partition:
8 Million Rows
Extent 2
Horizontal
Partition:
8 Million Rows
Extent 3
Horizontal
Partition:
8 Million Rows
Extent 1
Storage Architecture reduces I/O
• Only touch column files
that are in filter, projection, group by, and
join conditions
• Eliminate disk block touches
to partitions outside filter
and join conditions
Extent 1:
ShipDate: 2016-01-12 - 2016-03-05
Extent 2:
ShipDate: 2016-03-05 - 2016-09-23
Extent 3:
ShipDate: 2016-09-24 - 2017-01-06
SELECT Item, sum(Quantity) FROM Orders
WHERE ShipDate between ‘2016-01-01’ and ‘2016-01-31’
GROUP BY Item
Id OrderId Line Item Quantity Price Supplier ShipDate ShipMode
1 1 1 Laptop 5 1000 Dell 2016-01-12 G
2 1 2 Monitor 5 200 LG 2016-01-13 G
3 2 1 Mouse 1 20 Logitech 2016-02-05 M
4 3 1 Laptop 3 1600 Apple 2016-01-31 P
... ... ... ... ... ... ... ... ...
8M 2016-03-05
8M+1 2016-03-05
... ... ... ... ... ... ... ... ...
16M 2016-09-23
16M+1 2016-09-24
... ... ... ... ... ... ... ... ...
24M 2017-01-06
ELIMINATED PARTITION
ELIMINATED PARTITION
Query Analysis
MariaDB [tpch1]> select calsettrace(1);
...
MariaDB [tpch1]> select c_count, count(*) as custdist
-> from ( select c_custkey, count(o_orderkey) c_count
-> from v_customer left outer join v_orders on c_custkey = o_custkey
-> and o_comment not like '%special%requests%'
-> group by c_custkey ) c_orders
-> group by c_count
-> order by custdist desc, c_count desc;
...
42 rows in set, 1 warning (9.07 sec)
MariaDB [tpch1]> select calgetstats()G
*************************** 1. row ***************************
calgetstats(): Query Stats: MaxMemPct-4; NumTempFiles-0; TempFileSpace-0B; ApproxPhyI/O-0; CacheI/O-12503;
BlocksTouched-12503; PartitionBlocksEliminated-812; MsgBytesIn-102MB; MsgBytesOut-3KB; Mode-Distributed
1 row in set (0.00 sec)
Query Analysis
MariaDB [tpch1]> select calgettrace()G
*************************** 1. row ***************************
calgettrace():
Desc Mode Table TableOID ReferencedColumns PIO LIO PBE Elapsed Rows
BPS PM customer 7254 (c_custkey) 0 75 0 0.032 150000
TNS UM - - - - - - 0.045 150000
BPS PM customer 7254 (c_custkey) 0 0 75 0.000 0
TNS UM - - - - - - 0.000 0
TUS UM - - - - - - 0.303 150000
BPS PM orders 7268 (o_comment,o_custkey,o_orderkey) 0 12428 0 2.293 1500000
TNS UM - - - - - - 2.967 1500000
BPS PM orders 7268 (o_comment,o_custkey,o_orderkey) 0 0 737 0.000 0
TNS UM - - - - - - 0.000 0
TUS UM - - - - - - 3.796 1500000
HJS UM v_customer-v_orders - - - - - ----- -
TAS UM - - - - - - 1.658 150000
TNS UM - - - - - - 0.044 150000
TAS UM - - - - - - 0.050 42
1 row in set (0.01 sec)
Cross Engine Joins
● Allows non-ColumnStore tables to join
with ColumnStore
● The whole query is processed by
ColumnStore
● Cross Engine makes new MariaDB
connections to retrieve data from
non-ColumnStore tables Original
Query
Non-ColumnStore Query
(Cross Engine)
MariaDB
Server
ExeMgr
Optimizing for MariaDB ColumnStore
Data Modeling
● Star-schema optimizations are generally a good idea
● Conservative data typing is very important
○ Especially around fixed-length vs. dictionary boundary (8 bytes)
○ IP Address vs. IP Number
● Break down compound fields into individual fields:
○ Trivializes searching for sub-fields
○ Can avoid dictionary overhead
○ Cost to re-assemble is generally small
Data Insertion
● Order data as best you can before inserting
○ Helps extent elimination when min/max range for an extent is small
● Insert in large batches using cpimport or bulk write API
Improving Your Queries
● Avoid filtering on a >= 8byte VARCHAR/CHAR column where possible
○ Two extents need to be read per column, no extent elimination
● Use extent map elimination where possible
● Don’t use a function to filter
○ Extent elimination won’t happen
● Only reference required columns, avoid “SELECT *”
● Use the smallest possible data type for your data
● Avoid large ORDER BY
● Read https://mariadb.com/kb/en/mariadb/columnstore-performance-tuning/
Tuning
● Generally self-tuning
○ Uses as much RAM as possible automatically
○ Uses all CPU cores
● More RAM in PMs = more LRU data cache
● More RAM in UMs = ability to process aggregates / joins on bigger data sets
○ Disk joins are possible
Closing Notes
MariaDB ColumnStore 1.2 (later in 2018)
● MariaDB 10.3 base
● TIME datatype
● Microsecond support
● Improvements to LOAD DATA INFILE and INSERT...SELECT
● Phase 1 of MariaDB ColumnStore Storage Engine Convergence project
● Many other cool things
Thank you!
linuxjedi@mariadb.com
Twitter: @linuxjedi

More Related Content

What's hot

M|18 Under the Hood: Galera Cluster
M|18 Under the Hood: Galera ClusterM|18 Under the Hood: Galera Cluster
M|18 Under the Hood: Galera Cluster
MariaDB plc
 
What to expect from MariaDB Platform X5, part 2
What to expect from MariaDB Platform X5, part 2What to expect from MariaDB Platform X5, part 2
What to expect from MariaDB Platform X5, part 2
MariaDB plc
 
M|18 Why Abstract Away the Underlying Database Infrastructure
M|18 Why Abstract Away the Underlying Database InfrastructureM|18 Why Abstract Away the Underlying Database Infrastructure
M|18 Why Abstract Away the Underlying Database Infrastructure
MariaDB plc
 
MariaDB ColumnStore
MariaDB ColumnStoreMariaDB ColumnStore
MariaDB ColumnStore
MariaDB plc
 
M|18 Architectural Overview: MariaDB MaxScale
M|18 Architectural Overview: MariaDB MaxScaleM|18 Architectural Overview: MariaDB MaxScale
M|18 Architectural Overview: MariaDB MaxScale
MariaDB plc
 
How to migrate from Oracle Database with ease
How to migrate from Oracle Database with easeHow to migrate from Oracle Database with ease
How to migrate from Oracle Database with ease
MariaDB plc
 
MariaDB Platform for hybrid transactional/analytical workloads
MariaDB Platform for hybrid transactional/analytical workloadsMariaDB Platform for hybrid transactional/analytical workloads
MariaDB Platform for hybrid transactional/analytical workloads
MariaDB plc
 
How QBerg scaled to store data longer, query it faster
How QBerg scaled to store data longer, query it fasterHow QBerg scaled to store data longer, query it faster
How QBerg scaled to store data longer, query it faster
MariaDB plc
 
M|18 Analyzing Data with the MariaDB AX Platform
M|18 Analyzing Data with the MariaDB AX PlatformM|18 Analyzing Data with the MariaDB AX Platform
M|18 Analyzing Data with the MariaDB AX Platform
MariaDB plc
 
What’s new in Galera 4
What’s new in Galera 4What’s new in Galera 4
What’s new in Galera 4
MariaDB plc
 
Faster, better, stronger: The new InnoDB
Faster, better, stronger: The new InnoDBFaster, better, stronger: The new InnoDB
Faster, better, stronger: The new InnoDB
MariaDB plc
 
Scylla Summit 2022: Making Schema Changes Safe with Raft
Scylla Summit 2022: Making Schema Changes Safe with RaftScylla Summit 2022: Making Schema Changes Safe with Raft
Scylla Summit 2022: Making Schema Changes Safe with Raft
ScyllaDB
 
How Scylla Manager Handles Backups
How Scylla Manager Handles BackupsHow Scylla Manager Handles Backups
How Scylla Manager Handles Backups
ScyllaDB
 
What's new in MariaDB Platform X3
What's new in MariaDB Platform X3What's new in MariaDB Platform X3
What's new in MariaDB Platform X3
MariaDB plc
 
Pgxc scalability pg_open2012
Pgxc scalability pg_open2012Pgxc scalability pg_open2012
Pgxc scalability pg_open2012
Ashutosh Bapat
 
M|18 How DBAs at TradingScreen Make Life Easier With Automation
M|18 How DBAs at TradingScreen Make Life Easier With AutomationM|18 How DBAs at TradingScreen Make Life Easier With Automation
M|18 How DBAs at TradingScreen Make Life Easier With Automation
MariaDB plc
 
How Scylla Make Adding and Removing Nodes Faster and Safer
How Scylla Make Adding and Removing Nodes Faster and SaferHow Scylla Make Adding and Removing Nodes Faster and Safer
How Scylla Make Adding and Removing Nodes Faster and Safer
ScyllaDB
 
MariaDB Enterprise & MariaDB Enterprise Cluster - MariaDB Webinar July 2014 F...
MariaDB Enterprise & MariaDB Enterprise Cluster - MariaDB Webinar July 2014 F...MariaDB Enterprise & MariaDB Enterprise Cluster - MariaDB Webinar July 2014 F...
MariaDB Enterprise & MariaDB Enterprise Cluster - MariaDB Webinar July 2014 F...
MariaDB Corporation
 
Managing terabytes: When Postgres gets big
Managing terabytes: When Postgres gets bigManaging terabytes: When Postgres gets big
Managing terabytes: When Postgres gets big
Selena Deckelmann
 
Using Pentaho with MariaDB ColumnStore
Using Pentaho with MariaDB ColumnStoreUsing Pentaho with MariaDB ColumnStore
Using Pentaho with MariaDB ColumnStore
MariaDB plc
 

What's hot (20)

M|18 Under the Hood: Galera Cluster
M|18 Under the Hood: Galera ClusterM|18 Under the Hood: Galera Cluster
M|18 Under the Hood: Galera Cluster
 
What to expect from MariaDB Platform X5, part 2
What to expect from MariaDB Platform X5, part 2What to expect from MariaDB Platform X5, part 2
What to expect from MariaDB Platform X5, part 2
 
M|18 Why Abstract Away the Underlying Database Infrastructure
M|18 Why Abstract Away the Underlying Database InfrastructureM|18 Why Abstract Away the Underlying Database Infrastructure
M|18 Why Abstract Away the Underlying Database Infrastructure
 
MariaDB ColumnStore
MariaDB ColumnStoreMariaDB ColumnStore
MariaDB ColumnStore
 
M|18 Architectural Overview: MariaDB MaxScale
M|18 Architectural Overview: MariaDB MaxScaleM|18 Architectural Overview: MariaDB MaxScale
M|18 Architectural Overview: MariaDB MaxScale
 
How to migrate from Oracle Database with ease
How to migrate from Oracle Database with easeHow to migrate from Oracle Database with ease
How to migrate from Oracle Database with ease
 
MariaDB Platform for hybrid transactional/analytical workloads
MariaDB Platform for hybrid transactional/analytical workloadsMariaDB Platform for hybrid transactional/analytical workloads
MariaDB Platform for hybrid transactional/analytical workloads
 
How QBerg scaled to store data longer, query it faster
How QBerg scaled to store data longer, query it fasterHow QBerg scaled to store data longer, query it faster
How QBerg scaled to store data longer, query it faster
 
M|18 Analyzing Data with the MariaDB AX Platform
M|18 Analyzing Data with the MariaDB AX PlatformM|18 Analyzing Data with the MariaDB AX Platform
M|18 Analyzing Data with the MariaDB AX Platform
 
What’s new in Galera 4
What’s new in Galera 4What’s new in Galera 4
What’s new in Galera 4
 
Faster, better, stronger: The new InnoDB
Faster, better, stronger: The new InnoDBFaster, better, stronger: The new InnoDB
Faster, better, stronger: The new InnoDB
 
Scylla Summit 2022: Making Schema Changes Safe with Raft
Scylla Summit 2022: Making Schema Changes Safe with RaftScylla Summit 2022: Making Schema Changes Safe with Raft
Scylla Summit 2022: Making Schema Changes Safe with Raft
 
How Scylla Manager Handles Backups
How Scylla Manager Handles BackupsHow Scylla Manager Handles Backups
How Scylla Manager Handles Backups
 
What's new in MariaDB Platform X3
What's new in MariaDB Platform X3What's new in MariaDB Platform X3
What's new in MariaDB Platform X3
 
Pgxc scalability pg_open2012
Pgxc scalability pg_open2012Pgxc scalability pg_open2012
Pgxc scalability pg_open2012
 
M|18 How DBAs at TradingScreen Make Life Easier With Automation
M|18 How DBAs at TradingScreen Make Life Easier With AutomationM|18 How DBAs at TradingScreen Make Life Easier With Automation
M|18 How DBAs at TradingScreen Make Life Easier With Automation
 
How Scylla Make Adding and Removing Nodes Faster and Safer
How Scylla Make Adding and Removing Nodes Faster and SaferHow Scylla Make Adding and Removing Nodes Faster and Safer
How Scylla Make Adding and Removing Nodes Faster and Safer
 
MariaDB Enterprise & MariaDB Enterprise Cluster - MariaDB Webinar July 2014 F...
MariaDB Enterprise & MariaDB Enterprise Cluster - MariaDB Webinar July 2014 F...MariaDB Enterprise & MariaDB Enterprise Cluster - MariaDB Webinar July 2014 F...
MariaDB Enterprise & MariaDB Enterprise Cluster - MariaDB Webinar July 2014 F...
 
Managing terabytes: When Postgres gets big
Managing terabytes: When Postgres gets bigManaging terabytes: When Postgres gets big
Managing terabytes: When Postgres gets big
 
Using Pentaho with MariaDB ColumnStore
Using Pentaho with MariaDB ColumnStoreUsing Pentaho with MariaDB ColumnStore
Using Pentaho with MariaDB ColumnStore
 

Similar to M|18 Understanding the Architecture of MariaDB ColumnStore

A Brief Introduction of TiDB (Percona Live)
A Brief Introduction of TiDB (Percona Live)A Brief Introduction of TiDB (Percona Live)
A Brief Introduction of TiDB (Percona Live)
PingCAP
 
How to build TiDB
How to build TiDBHow to build TiDB
How to build TiDB
PingCAP
 
Best Practices for Migrating your Data Warehouse to Amazon Redshift
Best Practices for Migrating your Data Warehouse to Amazon RedshiftBest Practices for Migrating your Data Warehouse to Amazon Redshift
Best Practices for Migrating your Data Warehouse to Amazon Redshift
Amazon Web Services
 
Best Practices for Migrating your Data Warehouse to Amazon Redshift
Best Practices for Migrating your Data Warehouse to Amazon RedshiftBest Practices for Migrating your Data Warehouse to Amazon Redshift
Best Practices for Migrating your Data Warehouse to Amazon Redshift
Amazon Web Services
 
Big Data LDN 2017: Big Data Analytics with MariaDB ColumnStore
Big Data LDN 2017: Big Data Analytics with MariaDB ColumnStoreBig Data LDN 2017: Big Data Analytics with MariaDB ColumnStore
Big Data LDN 2017: Big Data Analytics with MariaDB ColumnStore
Matt Stubbs
 
Maryna Popova "Deep dive AWS Redshift"
Maryna Popova "Deep dive AWS Redshift"Maryna Popova "Deep dive AWS Redshift"
Maryna Popova "Deep dive AWS Redshift"
Lviv Startup Club
 
Demystifying MS17-010: Reverse Engineering the ETERNAL Exploits
Demystifying MS17-010: Reverse Engineering the ETERNAL ExploitsDemystifying MS17-010: Reverse Engineering the ETERNAL Exploits
Demystifying MS17-010: Reverse Engineering the ETERNAL Exploits
Priyanka Aash
 
Best Practices for Migrating your Data Warehouse to Amazon Redshift
Best Practices for Migrating your Data Warehouse to Amazon Redshift Best Practices for Migrating your Data Warehouse to Amazon Redshift
Best Practices for Migrating your Data Warehouse to Amazon Redshift
Amazon Web Services
 
TiDB vs Aurora.pdf
TiDB vs Aurora.pdfTiDB vs Aurora.pdf
TiDB vs Aurora.pdf
ssuser3fb50b
 
Big data should be simple
Big data should be simpleBig data should be simple
Big data should be simple
Dori Waldman
 
Distributed unique id generation
Distributed unique id generationDistributed unique id generation
Distributed unique id generation
Tung Nguyen
 
Sesión técnica: Big Data Analytics con MariaDB ColumnStore
Sesión técnica: Big Data Analytics con MariaDB ColumnStoreSesión técnica: Big Data Analytics con MariaDB ColumnStore
Sesión técnica: Big Data Analytics con MariaDB ColumnStore
MariaDB plc
 
Argus Production Monitoring at Salesforce
Argus Production Monitoring at SalesforceArgus Production Monitoring at Salesforce
Argus Production Monitoring at Salesforce
HBaseCon
 
Argus Production Monitoring at Salesforce
Argus Production Monitoring at Salesforce Argus Production Monitoring at Salesforce
Argus Production Monitoring at Salesforce
HBaseCon
 
Building an Amazon Datawarehouse and Using Business Intelligence Analytics Tools
Building an Amazon Datawarehouse and Using Business Intelligence Analytics ToolsBuilding an Amazon Datawarehouse and Using Business Intelligence Analytics Tools
Building an Amazon Datawarehouse and Using Business Intelligence Analytics Tools
Amazon Web Services
 
Scylla Summit 2017: Scylla for Mass Simultaneous Sensor Data Processing of ME...
Scylla Summit 2017: Scylla for Mass Simultaneous Sensor Data Processing of ME...Scylla Summit 2017: Scylla for Mass Simultaneous Sensor Data Processing of ME...
Scylla Summit 2017: Scylla for Mass Simultaneous Sensor Data Processing of ME...
ScyllaDB
 
The Parquet Format and Performance Optimization Opportunities
The Parquet Format and Performance Optimization OpportunitiesThe Parquet Format and Performance Optimization Opportunities
The Parquet Format and Performance Optimization Opportunities
Databricks
 
Deep Dive into DynamoDB
Deep Dive into DynamoDBDeep Dive into DynamoDB
Deep Dive into DynamoDB
AWS Germany
 
OSMC 2018 | Learnings, patterns and Uber’s metrics platform M3, open sourced ...
OSMC 2018 | Learnings, patterns and Uber’s metrics platform M3, open sourced ...OSMC 2018 | Learnings, patterns and Uber’s metrics platform M3, open sourced ...
OSMC 2018 | Learnings, patterns and Uber’s metrics platform M3, open sourced ...
NETWAYS
 
Mirko Damiani - An Embedded soft real time distributed system in Go
Mirko Damiani - An Embedded soft real time distributed system in GoMirko Damiani - An Embedded soft real time distributed system in Go
Mirko Damiani - An Embedded soft real time distributed system in Go
linuxlab_conf
 

Similar to M|18 Understanding the Architecture of MariaDB ColumnStore (20)

A Brief Introduction of TiDB (Percona Live)
A Brief Introduction of TiDB (Percona Live)A Brief Introduction of TiDB (Percona Live)
A Brief Introduction of TiDB (Percona Live)
 
How to build TiDB
How to build TiDBHow to build TiDB
How to build TiDB
 
Best Practices for Migrating your Data Warehouse to Amazon Redshift
Best Practices for Migrating your Data Warehouse to Amazon RedshiftBest Practices for Migrating your Data Warehouse to Amazon Redshift
Best Practices for Migrating your Data Warehouse to Amazon Redshift
 
Best Practices for Migrating your Data Warehouse to Amazon Redshift
Best Practices for Migrating your Data Warehouse to Amazon RedshiftBest Practices for Migrating your Data Warehouse to Amazon Redshift
Best Practices for Migrating your Data Warehouse to Amazon Redshift
 
Big Data LDN 2017: Big Data Analytics with MariaDB ColumnStore
Big Data LDN 2017: Big Data Analytics with MariaDB ColumnStoreBig Data LDN 2017: Big Data Analytics with MariaDB ColumnStore
Big Data LDN 2017: Big Data Analytics with MariaDB ColumnStore
 
Maryna Popova "Deep dive AWS Redshift"
Maryna Popova "Deep dive AWS Redshift"Maryna Popova "Deep dive AWS Redshift"
Maryna Popova "Deep dive AWS Redshift"
 
Demystifying MS17-010: Reverse Engineering the ETERNAL Exploits
Demystifying MS17-010: Reverse Engineering the ETERNAL ExploitsDemystifying MS17-010: Reverse Engineering the ETERNAL Exploits
Demystifying MS17-010: Reverse Engineering the ETERNAL Exploits
 
Best Practices for Migrating your Data Warehouse to Amazon Redshift
Best Practices for Migrating your Data Warehouse to Amazon Redshift Best Practices for Migrating your Data Warehouse to Amazon Redshift
Best Practices for Migrating your Data Warehouse to Amazon Redshift
 
TiDB vs Aurora.pdf
TiDB vs Aurora.pdfTiDB vs Aurora.pdf
TiDB vs Aurora.pdf
 
Big data should be simple
Big data should be simpleBig data should be simple
Big data should be simple
 
Distributed unique id generation
Distributed unique id generationDistributed unique id generation
Distributed unique id generation
 
Sesión técnica: Big Data Analytics con MariaDB ColumnStore
Sesión técnica: Big Data Analytics con MariaDB ColumnStoreSesión técnica: Big Data Analytics con MariaDB ColumnStore
Sesión técnica: Big Data Analytics con MariaDB ColumnStore
 
Argus Production Monitoring at Salesforce
Argus Production Monitoring at SalesforceArgus Production Monitoring at Salesforce
Argus Production Monitoring at Salesforce
 
Argus Production Monitoring at Salesforce
Argus Production Monitoring at Salesforce Argus Production Monitoring at Salesforce
Argus Production Monitoring at Salesforce
 
Building an Amazon Datawarehouse and Using Business Intelligence Analytics Tools
Building an Amazon Datawarehouse and Using Business Intelligence Analytics ToolsBuilding an Amazon Datawarehouse and Using Business Intelligence Analytics Tools
Building an Amazon Datawarehouse and Using Business Intelligence Analytics Tools
 
Scylla Summit 2017: Scylla for Mass Simultaneous Sensor Data Processing of ME...
Scylla Summit 2017: Scylla for Mass Simultaneous Sensor Data Processing of ME...Scylla Summit 2017: Scylla for Mass Simultaneous Sensor Data Processing of ME...
Scylla Summit 2017: Scylla for Mass Simultaneous Sensor Data Processing of ME...
 
The Parquet Format and Performance Optimization Opportunities
The Parquet Format and Performance Optimization OpportunitiesThe Parquet Format and Performance Optimization Opportunities
The Parquet Format and Performance Optimization Opportunities
 
Deep Dive into DynamoDB
Deep Dive into DynamoDBDeep Dive into DynamoDB
Deep Dive into DynamoDB
 
OSMC 2018 | Learnings, patterns and Uber’s metrics platform M3, open sourced ...
OSMC 2018 | Learnings, patterns and Uber’s metrics platform M3, open sourced ...OSMC 2018 | Learnings, patterns and Uber’s metrics platform M3, open sourced ...
OSMC 2018 | Learnings, patterns and Uber’s metrics platform M3, open sourced ...
 
Mirko Damiani - An Embedded soft real time distributed system in Go
Mirko Damiani - An Embedded soft real time distributed system in GoMirko Damiani - An Embedded soft real time distributed system in Go
Mirko Damiani - An Embedded soft real time distributed system in Go
 

More from MariaDB plc

MariaDB Paris Workshop 2023 - MaxScale 23.02.x
MariaDB Paris Workshop 2023 - MaxScale 23.02.xMariaDB Paris Workshop 2023 - MaxScale 23.02.x
MariaDB Paris Workshop 2023 - MaxScale 23.02.x
MariaDB plc
 
MariaDB Paris Workshop 2023 - Newpharma
MariaDB Paris Workshop 2023 - NewpharmaMariaDB Paris Workshop 2023 - Newpharma
MariaDB Paris Workshop 2023 - Newpharma
MariaDB plc
 
MariaDB Paris Workshop 2023 - Cloud
MariaDB Paris Workshop 2023 - CloudMariaDB Paris Workshop 2023 - Cloud
MariaDB Paris Workshop 2023 - Cloud
MariaDB plc
 
MariaDB Paris Workshop 2023 - MariaDB Enterprise
MariaDB Paris Workshop 2023 - MariaDB EnterpriseMariaDB Paris Workshop 2023 - MariaDB Enterprise
MariaDB Paris Workshop 2023 - MariaDB Enterprise
MariaDB plc
 
MariaDB Paris Workshop 2023 - Performance Optimization
MariaDB Paris Workshop 2023 - Performance OptimizationMariaDB Paris Workshop 2023 - Performance Optimization
MariaDB Paris Workshop 2023 - Performance Optimization
MariaDB plc
 
MariaDB Paris Workshop 2023 - MaxScale
MariaDB Paris Workshop 2023 - MaxScale MariaDB Paris Workshop 2023 - MaxScale
MariaDB Paris Workshop 2023 - MaxScale
MariaDB plc
 
MariaDB Paris Workshop 2023 - novadys presentation
MariaDB Paris Workshop 2023 - novadys presentationMariaDB Paris Workshop 2023 - novadys presentation
MariaDB Paris Workshop 2023 - novadys presentation
MariaDB plc
 
MariaDB Paris Workshop 2023 - DARVA presentation
MariaDB Paris Workshop 2023 - DARVA presentationMariaDB Paris Workshop 2023 - DARVA presentation
MariaDB Paris Workshop 2023 - DARVA presentation
MariaDB plc
 
MariaDB Tech und Business Update Hamburg 2023 - MariaDB Enterprise Server
MariaDB Tech und Business Update Hamburg 2023 - MariaDB Enterprise Server MariaDB Tech und Business Update Hamburg 2023 - MariaDB Enterprise Server
MariaDB Tech und Business Update Hamburg 2023 - MariaDB Enterprise Server
MariaDB plc
 
MariaDB SkySQL Autonome Skalierung, Observability, Cloud-Backup
MariaDB SkySQL Autonome Skalierung, Observability, Cloud-BackupMariaDB SkySQL Autonome Skalierung, Observability, Cloud-Backup
MariaDB SkySQL Autonome Skalierung, Observability, Cloud-Backup
MariaDB plc
 
Einführung : MariaDB Tech und Business Update Hamburg 2023
Einführung : MariaDB Tech und Business Update Hamburg 2023Einführung : MariaDB Tech und Business Update Hamburg 2023
Einführung : MariaDB Tech und Business Update Hamburg 2023
MariaDB plc
 
Hochverfügbarkeitslösungen mit MariaDB
Hochverfügbarkeitslösungen mit MariaDBHochverfügbarkeitslösungen mit MariaDB
Hochverfügbarkeitslösungen mit MariaDB
MariaDB plc
 
Die Neuheiten in MariaDB Enterprise Server
Die Neuheiten in MariaDB Enterprise ServerDie Neuheiten in MariaDB Enterprise Server
Die Neuheiten in MariaDB Enterprise Server
MariaDB plc
 
Global Data Replication with Galera for Ansell Guardian®
Global Data Replication with Galera for Ansell Guardian®Global Data Replication with Galera for Ansell Guardian®
Global Data Replication with Galera for Ansell Guardian®
MariaDB plc
 
Introducing workload analysis
Introducing workload analysisIntroducing workload analysis
Introducing workload analysis
MariaDB plc
 
Under the hood: SkySQL monitoring
Under the hood: SkySQL monitoringUnder the hood: SkySQL monitoring
Under the hood: SkySQL monitoring
MariaDB plc
 
Introducing the R2DBC async Java connector
Introducing the R2DBC async Java connectorIntroducing the R2DBC async Java connector
Introducing the R2DBC async Java connector
MariaDB plc
 
MariaDB Enterprise Tools introduction
MariaDB Enterprise Tools introductionMariaDB Enterprise Tools introduction
MariaDB Enterprise Tools introduction
MariaDB plc
 
The architecture of SkySQL
The architecture of SkySQLThe architecture of SkySQL
The architecture of SkySQL
MariaDB plc
 
What to expect from MariaDB Platform X5, part 1
What to expect from MariaDB Platform X5, part 1What to expect from MariaDB Platform X5, part 1
What to expect from MariaDB Platform X5, part 1
MariaDB plc
 

More from MariaDB plc (20)

MariaDB Paris Workshop 2023 - MaxScale 23.02.x
MariaDB Paris Workshop 2023 - MaxScale 23.02.xMariaDB Paris Workshop 2023 - MaxScale 23.02.x
MariaDB Paris Workshop 2023 - MaxScale 23.02.x
 
MariaDB Paris Workshop 2023 - Newpharma
MariaDB Paris Workshop 2023 - NewpharmaMariaDB Paris Workshop 2023 - Newpharma
MariaDB Paris Workshop 2023 - Newpharma
 
MariaDB Paris Workshop 2023 - Cloud
MariaDB Paris Workshop 2023 - CloudMariaDB Paris Workshop 2023 - Cloud
MariaDB Paris Workshop 2023 - Cloud
 
MariaDB Paris Workshop 2023 - MariaDB Enterprise
MariaDB Paris Workshop 2023 - MariaDB EnterpriseMariaDB Paris Workshop 2023 - MariaDB Enterprise
MariaDB Paris Workshop 2023 - MariaDB Enterprise
 
MariaDB Paris Workshop 2023 - Performance Optimization
MariaDB Paris Workshop 2023 - Performance OptimizationMariaDB Paris Workshop 2023 - Performance Optimization
MariaDB Paris Workshop 2023 - Performance Optimization
 
MariaDB Paris Workshop 2023 - MaxScale
MariaDB Paris Workshop 2023 - MaxScale MariaDB Paris Workshop 2023 - MaxScale
MariaDB Paris Workshop 2023 - MaxScale
 
MariaDB Paris Workshop 2023 - novadys presentation
MariaDB Paris Workshop 2023 - novadys presentationMariaDB Paris Workshop 2023 - novadys presentation
MariaDB Paris Workshop 2023 - novadys presentation
 
MariaDB Paris Workshop 2023 - DARVA presentation
MariaDB Paris Workshop 2023 - DARVA presentationMariaDB Paris Workshop 2023 - DARVA presentation
MariaDB Paris Workshop 2023 - DARVA presentation
 
MariaDB Tech und Business Update Hamburg 2023 - MariaDB Enterprise Server
MariaDB Tech und Business Update Hamburg 2023 - MariaDB Enterprise Server MariaDB Tech und Business Update Hamburg 2023 - MariaDB Enterprise Server
MariaDB Tech und Business Update Hamburg 2023 - MariaDB Enterprise Server
 
MariaDB SkySQL Autonome Skalierung, Observability, Cloud-Backup
MariaDB SkySQL Autonome Skalierung, Observability, Cloud-BackupMariaDB SkySQL Autonome Skalierung, Observability, Cloud-Backup
MariaDB SkySQL Autonome Skalierung, Observability, Cloud-Backup
 
Einführung : MariaDB Tech und Business Update Hamburg 2023
Einführung : MariaDB Tech und Business Update Hamburg 2023Einführung : MariaDB Tech und Business Update Hamburg 2023
Einführung : MariaDB Tech und Business Update Hamburg 2023
 
Hochverfügbarkeitslösungen mit MariaDB
Hochverfügbarkeitslösungen mit MariaDBHochverfügbarkeitslösungen mit MariaDB
Hochverfügbarkeitslösungen mit MariaDB
 
Die Neuheiten in MariaDB Enterprise Server
Die Neuheiten in MariaDB Enterprise ServerDie Neuheiten in MariaDB Enterprise Server
Die Neuheiten in MariaDB Enterprise Server
 
Global Data Replication with Galera for Ansell Guardian®
Global Data Replication with Galera for Ansell Guardian®Global Data Replication with Galera for Ansell Guardian®
Global Data Replication with Galera for Ansell Guardian®
 
Introducing workload analysis
Introducing workload analysisIntroducing workload analysis
Introducing workload analysis
 
Under the hood: SkySQL monitoring
Under the hood: SkySQL monitoringUnder the hood: SkySQL monitoring
Under the hood: SkySQL monitoring
 
Introducing the R2DBC async Java connector
Introducing the R2DBC async Java connectorIntroducing the R2DBC async Java connector
Introducing the R2DBC async Java connector
 
MariaDB Enterprise Tools introduction
MariaDB Enterprise Tools introductionMariaDB Enterprise Tools introduction
MariaDB Enterprise Tools introduction
 
The architecture of SkySQL
The architecture of SkySQLThe architecture of SkySQL
The architecture of SkySQL
 
What to expect from MariaDB Platform X5, part 1
What to expect from MariaDB Platform X5, part 1What to expect from MariaDB Platform X5, part 1
What to expect from MariaDB Platform X5, part 1
 

Recently uploaded

FIDO Munich Seminar: Strong Workforce Authn Push & Pull Factors.pptx
FIDO Munich Seminar: Strong Workforce Authn Push & Pull Factors.pptxFIDO Munich Seminar: Strong Workforce Authn Push & Pull Factors.pptx
FIDO Munich Seminar: Strong Workforce Authn Push & Pull Factors.pptx
FIDO Alliance
 
Indian Privacy law & Infosec for Startups
Indian Privacy law & Infosec for StartupsIndian Privacy law & Infosec for Startups
Indian Privacy law & Infosec for Startups
AMol NAik
 
"Building Future-Ready Apps with .NET 8 and Azure Serverless Ecosystem", Stan...
"Building Future-Ready Apps with .NET 8 and Azure Serverless Ecosystem", Stan..."Building Future-Ready Apps with .NET 8 and Azure Serverless Ecosystem", Stan...
"Building Future-Ready Apps with .NET 8 and Azure Serverless Ecosystem", Stan...
Fwdays
 
Perth MuleSoft Meetup July 2024
Perth MuleSoft Meetup July 2024Perth MuleSoft Meetup July 2024
Perth MuleSoft Meetup July 2024
Michael Price
 
How UiPath Discovery Suite supports identification of Agentic Process Automat...
How UiPath Discovery Suite supports identification of Agentic Process Automat...How UiPath Discovery Suite supports identification of Agentic Process Automat...
How UiPath Discovery Suite supports identification of Agentic Process Automat...
DianaGray10
 
What's New in Teams Calling, Meetings, Devices June 2024
What's New in Teams Calling, Meetings, Devices June 2024What's New in Teams Calling, Meetings, Devices June 2024
What's New in Teams Calling, Meetings, Devices June 2024
Stephanie Beckett
 
FIDO Munich Seminar: FIDO Tech Principles.pptx
FIDO Munich Seminar: FIDO Tech Principles.pptxFIDO Munich Seminar: FIDO Tech Principles.pptx
FIDO Munich Seminar: FIDO Tech Principles.pptx
FIDO Alliance
 
Redefining Cybersecurity with AI Capabilities
Redefining Cybersecurity with AI CapabilitiesRedefining Cybersecurity with AI Capabilities
Redefining Cybersecurity with AI Capabilities
Priyanka Aash
 
FIDO Munich Seminar Blueprint for In-Vehicle Payment Standard.pptx
FIDO Munich Seminar Blueprint for In-Vehicle Payment Standard.pptxFIDO Munich Seminar Blueprint for In-Vehicle Payment Standard.pptx
FIDO Munich Seminar Blueprint for In-Vehicle Payment Standard.pptx
FIDO Alliance
 
Demystifying Neural Networks And Building Cybersecurity Applications
Demystifying Neural Networks And Building Cybersecurity ApplicationsDemystifying Neural Networks And Building Cybersecurity Applications
Demystifying Neural Networks And Building Cybersecurity Applications
Priyanka Aash
 
UiPath Community Day Amsterdam: Code, Collaborate, Connect
UiPath Community Day Amsterdam: Code, Collaborate, ConnectUiPath Community Day Amsterdam: Code, Collaborate, Connect
UiPath Community Day Amsterdam: Code, Collaborate, Connect
UiPathCommunity
 
AMD Zen 5 Architecture Deep Dive from Tech Day
AMD Zen 5 Architecture Deep Dive from Tech DayAMD Zen 5 Architecture Deep Dive from Tech Day
AMD Zen 5 Architecture Deep Dive from Tech Day
Low Hong Chuan
 
FIDO Munich Seminar Workforce Authentication Case Study.pptx
FIDO Munich Seminar Workforce Authentication Case Study.pptxFIDO Munich Seminar Workforce Authentication Case Study.pptx
FIDO Munich Seminar Workforce Authentication Case Study.pptx
FIDO Alliance
 
Garbage In, Garbage Out: Why poor data curation is killing your AI models (an...
Garbage In, Garbage Out: Why poor data curation is killing your AI models (an...Garbage In, Garbage Out: Why poor data curation is killing your AI models (an...
Garbage In, Garbage Out: Why poor data curation is killing your AI models (an...
Zilliz
 
Discovery Series - Zero to Hero - Task Mining Session 1
Discovery Series - Zero to Hero - Task Mining Session 1Discovery Series - Zero to Hero - Task Mining Session 1
Discovery Series - Zero to Hero - Task Mining Session 1
DianaGray10
 
What's New in Copilot for Microsoft 365 June 2024.pptx
What's New in Copilot for Microsoft 365 June 2024.pptxWhat's New in Copilot for Microsoft 365 June 2024.pptx
What's New in Copilot for Microsoft 365 June 2024.pptx
Stephanie Beckett
 
The History of Embeddings & Multimodal Embeddings
The History of Embeddings & Multimodal EmbeddingsThe History of Embeddings & Multimodal Embeddings
The History of Embeddings & Multimodal Embeddings
Zilliz
 
FIDO Munich Seminar: Securing Smart Car.pptx
FIDO Munich Seminar: Securing Smart Car.pptxFIDO Munich Seminar: Securing Smart Car.pptx
FIDO Munich Seminar: Securing Smart Car.pptx
FIDO Alliance
 
Keynote : Presentation on SASE Technology
Keynote : Presentation on SASE TechnologyKeynote : Presentation on SASE Technology
Keynote : Presentation on SASE Technology
Priyanka Aash
 
History and Introduction for Generative AI ( GenAI )
History and Introduction for Generative AI ( GenAI )History and Introduction for Generative AI ( GenAI )
History and Introduction for Generative AI ( GenAI )
Badri_Bady
 

Recently uploaded (20)

FIDO Munich Seminar: Strong Workforce Authn Push & Pull Factors.pptx
FIDO Munich Seminar: Strong Workforce Authn Push & Pull Factors.pptxFIDO Munich Seminar: Strong Workforce Authn Push & Pull Factors.pptx
FIDO Munich Seminar: Strong Workforce Authn Push & Pull Factors.pptx
 
Indian Privacy law & Infosec for Startups
Indian Privacy law & Infosec for StartupsIndian Privacy law & Infosec for Startups
Indian Privacy law & Infosec for Startups
 
"Building Future-Ready Apps with .NET 8 and Azure Serverless Ecosystem", Stan...
"Building Future-Ready Apps with .NET 8 and Azure Serverless Ecosystem", Stan..."Building Future-Ready Apps with .NET 8 and Azure Serverless Ecosystem", Stan...
"Building Future-Ready Apps with .NET 8 and Azure Serverless Ecosystem", Stan...
 
Perth MuleSoft Meetup July 2024
Perth MuleSoft Meetup July 2024Perth MuleSoft Meetup July 2024
Perth MuleSoft Meetup July 2024
 
How UiPath Discovery Suite supports identification of Agentic Process Automat...
How UiPath Discovery Suite supports identification of Agentic Process Automat...How UiPath Discovery Suite supports identification of Agentic Process Automat...
How UiPath Discovery Suite supports identification of Agentic Process Automat...
 
What's New in Teams Calling, Meetings, Devices June 2024
What's New in Teams Calling, Meetings, Devices June 2024What's New in Teams Calling, Meetings, Devices June 2024
What's New in Teams Calling, Meetings, Devices June 2024
 
FIDO Munich Seminar: FIDO Tech Principles.pptx
FIDO Munich Seminar: FIDO Tech Principles.pptxFIDO Munich Seminar: FIDO Tech Principles.pptx
FIDO Munich Seminar: FIDO Tech Principles.pptx
 
Redefining Cybersecurity with AI Capabilities
Redefining Cybersecurity with AI CapabilitiesRedefining Cybersecurity with AI Capabilities
Redefining Cybersecurity with AI Capabilities
 
FIDO Munich Seminar Blueprint for In-Vehicle Payment Standard.pptx
FIDO Munich Seminar Blueprint for In-Vehicle Payment Standard.pptxFIDO Munich Seminar Blueprint for In-Vehicle Payment Standard.pptx
FIDO Munich Seminar Blueprint for In-Vehicle Payment Standard.pptx
 
Demystifying Neural Networks And Building Cybersecurity Applications
Demystifying Neural Networks And Building Cybersecurity ApplicationsDemystifying Neural Networks And Building Cybersecurity Applications
Demystifying Neural Networks And Building Cybersecurity Applications
 
UiPath Community Day Amsterdam: Code, Collaborate, Connect
UiPath Community Day Amsterdam: Code, Collaborate, ConnectUiPath Community Day Amsterdam: Code, Collaborate, Connect
UiPath Community Day Amsterdam: Code, Collaborate, Connect
 
AMD Zen 5 Architecture Deep Dive from Tech Day
AMD Zen 5 Architecture Deep Dive from Tech DayAMD Zen 5 Architecture Deep Dive from Tech Day
AMD Zen 5 Architecture Deep Dive from Tech Day
 
FIDO Munich Seminar Workforce Authentication Case Study.pptx
FIDO Munich Seminar Workforce Authentication Case Study.pptxFIDO Munich Seminar Workforce Authentication Case Study.pptx
FIDO Munich Seminar Workforce Authentication Case Study.pptx
 
Garbage In, Garbage Out: Why poor data curation is killing your AI models (an...
Garbage In, Garbage Out: Why poor data curation is killing your AI models (an...Garbage In, Garbage Out: Why poor data curation is killing your AI models (an...
Garbage In, Garbage Out: Why poor data curation is killing your AI models (an...
 
Discovery Series - Zero to Hero - Task Mining Session 1
Discovery Series - Zero to Hero - Task Mining Session 1Discovery Series - Zero to Hero - Task Mining Session 1
Discovery Series - Zero to Hero - Task Mining Session 1
 
What's New in Copilot for Microsoft 365 June 2024.pptx
What's New in Copilot for Microsoft 365 June 2024.pptxWhat's New in Copilot for Microsoft 365 June 2024.pptx
What's New in Copilot for Microsoft 365 June 2024.pptx
 
The History of Embeddings & Multimodal Embeddings
The History of Embeddings & Multimodal EmbeddingsThe History of Embeddings & Multimodal Embeddings
The History of Embeddings & Multimodal Embeddings
 
FIDO Munich Seminar: Securing Smart Car.pptx
FIDO Munich Seminar: Securing Smart Car.pptxFIDO Munich Seminar: Securing Smart Car.pptx
FIDO Munich Seminar: Securing Smart Car.pptx
 
Keynote : Presentation on SASE Technology
Keynote : Presentation on SASE TechnologyKeynote : Presentation on SASE Technology
Keynote : Presentation on SASE Technology
 
History and Introduction for Generative AI ( GenAI )
History and Introduction for Generative AI ( GenAI )History and Introduction for Generative AI ( GenAI )
History and Introduction for Generative AI ( GenAI )
 

M|18 Understanding the Architecture of MariaDB ColumnStore

  • 1. MariaDB ColumnStore Understanding the Architecture Andrew Hutchings (LinuxJedi) Lead Software Engineer, MariaDB ColumnStore
  • 2. Who Am I? ● Andrew Hutchings, aka “LinuxJedi” ● Lead Software Engineer for MariaDB’s ColumnStore ● Previous worked for: ○ NGINX - Senior Developer Advocate / Technical Product Manager ○ HP - Principal Software Engineer (HP Cloud / ATG) ○ SkySQL - Senior Sustaining Engineer ○ Rackspace - Senior Software Engineer ○ Sun/Oracle - MySQL Senior Support Engineer ● Co-author of MySQL 5.1 Plugin Development ● IRC/Twitter: LinuxJedi ● EMail: linuxjedi@mariadb.com
  • 3. Overview ● History of MariaDB ColumnStore ● Technical Use Case ● Components of MariaDB ColumnStore ● Disk Storage ● Writing Data ● Querying Data ● Optimizing for MariaDB ColumnStore ● Closing Notes ● Questions
  • 4. History of MariaDB ColumnStore ● March 2010 - Calpont launches InfiniDB ● September 2014 - Calpont (now itself called InfiniDB) closes down ○ MariaDB (then SkySQL) supports InfiniDB customers ● April 2016 - MariaDB announces development of MariaDB ColumnStore ● August 2016 - I joined MariaDB and jumped straight into ColumnStore ● December 2016 - MariaDB ColumnStore 1.0 GA ○ InfiniDB + MariaDB 10.1 + Many fixes and improvements ● November 2017 - MariaDB ColumnStore 1.1 GA ○ MariaDB 10.2 + APIs + Even more improvements
  • 6. Technical Use Case MariaDB ColumnStore ● Very large data sets ○ Many columns ○ Many millions of rows ● Complex joins and aggregates ● Rapid bulk data insertion ○ The larger the batch the better Traditional OLTP Engines ● Smaller data sets ● Basic queries ● Lots of DML queries ● Complex data types
  • 7. Data Types ● INT types - range is 2 less from max unsigned or min signed ● CHAR† - max 255 bytes ● VARCHAR† - max 8000 bytes ● DECIMAL - max 18 digits ● DOUBLE/FLOAT ● DATETIME - no sub-seconds (coming in 1.2) ● DATE ● BLOB/TEXT† † Empty string is the same as NULL
  • 8. Other DDL Differences ● No indexes ○ Columns are somewhat self-indexing ● Auto increment is handled differently (a table comment) ● No constraints ● PARTITION syntax not supported ○ Columns are partitioned automatically
  • 9. Row-oriented vs. Column-oriented Format ID Fname Lname State Zip Phone Age Sex 1 Bugs Bunny NY 11217 (718) 938-3235 34 M 2 Yosemite Sam CA 95389 (209) 375-6572 52 M 3 Daffy Duck NY 10013 (212) 227-1810 35 M 4 Elmer Fudd ME 04578 (207) 882-7323 43 M 5 Witch Hazel MA 01970 (978) 744-0991 57 F ID 1 2 3 4 5 Fname Bugs Yosemite Daffy Elmer Witch Lname Bunny Sam Duck Fudd Hazel State NY CA NY ME MA Zip 11217 95389 10013 04578 01970 Phone (718) 938-3235 (209) 375-6572 (212) 227-1810 (207) 882-7323 (978) 744-0991 Age 34 52 35 43 57 Sex M M M M F SELECT Fname FROM People WHERE State = 'NY'
  • 10. Components of MariaDB ColumnStore
  • 11. ColumnStore Modules ● User Module (UM) ○ MariaDB Server / Storage Engine Plugin ○ ExeMgr ○ DMLProc, DDLProc ● Performance Module (PM) ○ PrimProc ○ WriteEngine ○ ProcMgr / ProcMon ○ DBRM
  • 12. Query Processing Shared Nothing Distributed Data Storage SQL Column Primitives User Module Performance Module UM PM Primitives ↓↓↓↓ Intermediate ↑↑Results↑↑
  • 13. Hardware Requirements ● Lots of RAM ○ minimum 32GB for UM, 16GB for PM ○ minimum 4GB for trying single server out on a VM ● Optimised for HDD spindles, will still work with SSD ○ We are looking into SSD optimisation soon ● More cores typically better ○ 8 core minimum recommendation ● For AWS m4.4xlarge is the recommended minimum
  • 15. Column Types • 8-byte fixed length token (pointer). • A variable length value stored at the location identified by the pointer. 1-byte Field with 8192 values per 8k block 2-byte Field with 4096 values per 8k block 4-byte Field with 2048 values per 8k block 8-byte Field with 1024 values per 8k block Dictionary structure made up of 2 files/extents with:
  • 16. Extent Map Object ID The ID for the column (or dictionary) Object Type Column or Dictionary LBID Start / End Start / End Logical Block Pointer Minimum Value Lowest value in the extent Maximum Value Highest value in the extent Width Column Width DBRoot DBRoot (disk partition) number Partition ID / Segment ID / Block Offset The extent number High Water Mark Atomic last block pointer
  • 17. Disk Storage Blocks (8KB) Extent1 (8MB~64MB 8 million rows) Logical Layer Segment File1 (maps to an Extent) Physical Layer Compression Chunks
  • 19. Inserting Data ● Multiple methods ○ Single INSERTs ○ INSERT...SELECT ○ LOAD DATA INFILE ○ cpimport ○ Bulk Write API ● Designed for large bulk inserts ● Inserts are appended at the end of extents (or new extents created) ○ This means reads are not affected ○ A High Water Mark pointing to the last block is moved at the end of the insert
  • 20. cpimport ● Uses CSV files or piped CSV data ● Fastest way to get data into ColumnStore ● Does minimal data conversion and pipes it straight into the PMs ○ Works by appending new blocks to the table and moving an atomic block pointer (HWM) ○ No UNDO log needed (atomic pointer not moved on rollback) ○ Therefore can cause a gap of 0-64KB in a column ● Can load multiple tables simultaneously ● Can load into multiple PMs for the same table simultaneously ● Can load into specific PMs for physical partitioning by PM
  • 21. Bulk Write API ● A simple C++ API to inject data into the PMs ○ Bindings in Python and Java available ● Works in a similar way to cpimport ○ Append new blocks and an atomic block pointer (HWM) ● LGPL licensed
  • 22. DML Writes ● Regular INSERT / UPDATE / DELETE ○ Also INSERT...SELECT and LOAD DATA INFILE when autocommit is off ● Slow compared to other engines ○ INSERT is very slow compared to cpimport ● Requires the use of a version buffer for an undo log ○ But INSERT appends to data blocks so no wasted storage ● Data sent to DMLProc to process
  • 23. A Note About DELETE ● Need to touch every column and the undo log ○ So very slow ● Also leaves a gap in the column that won’t be filled ● Having a column that is marked using an UPDATE query is faster ● Dropping entire partitions is instantaneous ○ Partitions can be disabled first
  • 24. INSERT...SELECT / LOAD DATA INFILE ● Injects the binary row data from MariaDB into cpimport ● Good for backwards compatibility with tools and remote loading ● cpimport then injects this data into the column extent files ○ In 1.2 it will use the write API instead ● If autocommit is turned off this will behave like regular DML instead (slow)
  • 26. Physical Execution Layout Round Robin MariaDB Client MariaDB Server ExeMgr ExeMgr PrimProc PrimProc PrimProc PrimProc
  • 27. Extent Elimination Horizontal Partition: 8 Million Rows Extent 2 Horizontal Partition: 8 Million Rows Extent 3 Horizontal Partition: 8 Million Rows Extent 1 Storage Architecture reduces I/O • Only touch column files that are in filter, projection, group by, and join conditions • Eliminate disk block touches to partitions outside filter and join conditions Extent 1: ShipDate: 2016-01-12 - 2016-03-05 Extent 2: ShipDate: 2016-03-05 - 2016-09-23 Extent 3: ShipDate: 2016-09-24 - 2017-01-06 SELECT Item, sum(Quantity) FROM Orders WHERE ShipDate between ‘2016-01-01’ and ‘2016-01-31’ GROUP BY Item Id OrderId Line Item Quantity Price Supplier ShipDate ShipMode 1 1 1 Laptop 5 1000 Dell 2016-01-12 G 2 1 2 Monitor 5 200 LG 2016-01-13 G 3 2 1 Mouse 1 20 Logitech 2016-02-05 M 4 3 1 Laptop 3 1600 Apple 2016-01-31 P ... ... ... ... ... ... ... ... ... 8M 2016-03-05 8M+1 2016-03-05 ... ... ... ... ... ... ... ... ... 16M 2016-09-23 16M+1 2016-09-24 ... ... ... ... ... ... ... ... ... 24M 2017-01-06 ELIMINATED PARTITION ELIMINATED PARTITION
  • 28. Query Analysis MariaDB [tpch1]> select calsettrace(1); ... MariaDB [tpch1]> select c_count, count(*) as custdist -> from ( select c_custkey, count(o_orderkey) c_count -> from v_customer left outer join v_orders on c_custkey = o_custkey -> and o_comment not like '%special%requests%' -> group by c_custkey ) c_orders -> group by c_count -> order by custdist desc, c_count desc; ... 42 rows in set, 1 warning (9.07 sec) MariaDB [tpch1]> select calgetstats()G *************************** 1. row *************************** calgetstats(): Query Stats: MaxMemPct-4; NumTempFiles-0; TempFileSpace-0B; ApproxPhyI/O-0; CacheI/O-12503; BlocksTouched-12503; PartitionBlocksEliminated-812; MsgBytesIn-102MB; MsgBytesOut-3KB; Mode-Distributed 1 row in set (0.00 sec)
  • 29. Query Analysis MariaDB [tpch1]> select calgettrace()G *************************** 1. row *************************** calgettrace(): Desc Mode Table TableOID ReferencedColumns PIO LIO PBE Elapsed Rows BPS PM customer 7254 (c_custkey) 0 75 0 0.032 150000 TNS UM - - - - - - 0.045 150000 BPS PM customer 7254 (c_custkey) 0 0 75 0.000 0 TNS UM - - - - - - 0.000 0 TUS UM - - - - - - 0.303 150000 BPS PM orders 7268 (o_comment,o_custkey,o_orderkey) 0 12428 0 2.293 1500000 TNS UM - - - - - - 2.967 1500000 BPS PM orders 7268 (o_comment,o_custkey,o_orderkey) 0 0 737 0.000 0 TNS UM - - - - - - 0.000 0 TUS UM - - - - - - 3.796 1500000 HJS UM v_customer-v_orders - - - - - ----- - TAS UM - - - - - - 1.658 150000 TNS UM - - - - - - 0.044 150000 TAS UM - - - - - - 0.050 42 1 row in set (0.01 sec)
  • 30. Cross Engine Joins ● Allows non-ColumnStore tables to join with ColumnStore ● The whole query is processed by ColumnStore ● Cross Engine makes new MariaDB connections to retrieve data from non-ColumnStore tables Original Query Non-ColumnStore Query (Cross Engine) MariaDB Server ExeMgr
  • 31. Optimizing for MariaDB ColumnStore
  • 32. Data Modeling ● Star-schema optimizations are generally a good idea ● Conservative data typing is very important ○ Especially around fixed-length vs. dictionary boundary (8 bytes) ○ IP Address vs. IP Number ● Break down compound fields into individual fields: ○ Trivializes searching for sub-fields ○ Can avoid dictionary overhead ○ Cost to re-assemble is generally small
  • 33. Data Insertion ● Order data as best you can before inserting ○ Helps extent elimination when min/max range for an extent is small ● Insert in large batches using cpimport or bulk write API
  • 34. Improving Your Queries ● Avoid filtering on a >= 8byte VARCHAR/CHAR column where possible ○ Two extents need to be read per column, no extent elimination ● Use extent map elimination where possible ● Don’t use a function to filter ○ Extent elimination won’t happen ● Only reference required columns, avoid “SELECT *” ● Use the smallest possible data type for your data ● Avoid large ORDER BY ● Read https://mariadb.com/kb/en/mariadb/columnstore-performance-tuning/
  • 35. Tuning ● Generally self-tuning ○ Uses as much RAM as possible automatically ○ Uses all CPU cores ● More RAM in PMs = more LRU data cache ● More RAM in UMs = ability to process aggregates / joins on bigger data sets ○ Disk joins are possible
  • 37. MariaDB ColumnStore 1.2 (later in 2018) ● MariaDB 10.3 base ● TIME datatype ● Microsecond support ● Improvements to LOAD DATA INFILE and INSERT...SELECT ● Phase 1 of MariaDB ColumnStore Storage Engine Convergence project ● Many other cool things