SlideShare a Scribd company logo
PostgreSQL query
planner’s internals
How I Learned to Stop Worrying
and Love the Planner
Alexey Ermakov
2 Why this talk?
• Why this query is so slow?
• Why planner is not using my index?
• What to do?
3 Where are we going?
• How planner works
• How we can affect it’s work
• When it can go wrong
• Known limitations
4 The Path of a Query
Rewrite system
Executor ↔ [Workers]
Send results
all in single process (backend) beside background workers (parallel seq scan,
5 EXPLAIN command
explain (ANALYZE,VERBOSE,COSTS,BUFFERS,TIMING1) select * from t1;
Seq Scan on public.t1 (cost=0.00..104424.80 rows=10000000 width=8)
(actual time=0.218..2316.688 rows=10000000 loops=1)
Output: f1, f2
Buffers: shared read=44248
I/O Timings: read=322.7142
Planning time: 0.024 ms
Execution time: 3852.588 ms
COSTS and TIMING options are on by default
I/O Timings shown when track_io_timing is enabled
6 Planner have to guess
Seq Scan on public.t1 (cost=0.00..104424.80 rows=10000000 width=8)
• startup cost
• total cost
• rows
• average row width
7 Cost stability principles
Quote from “Common issues with planner statistics by Tomas Vondra”: 3
• correlation to query duration: The estimated cost is correlated with
duration of the query, i.e. higher cost means longer execution.
• estimation stability: A small difference in estimation causes only small
difference in costs, i.e. small error in estimation causes only small cost
• cost stability: Small cost difference means small difference in duration.
• cost comparability: For a given query, two plans with (almost) the same
costs should result in (almost) the same duration.
8 Data retrieval methods
• seq scan – sequential scan of whole table
• index scan – random io (read index + read table)
• index only scan – read only index (9.2+)4
• bitmap index scan – something in between seq scan/index scan, possible
to use several indexes at same time in OR/AND conditions
9 Join methods
• nested loop – optimal for small relations
• hash join – optimal for big relations
• merge join – optimal for big relations if they’re sorted
10 Aggregate methods
• aggregate
• hash aggregate
• group aggregate
11 Planner Cost Constants
#seq_page_cost = 1.0 # cost of a sequentially-fetched disk page
#random_page_cost = 4.0 # cost of a non-sequentially-fetched disk page
#cpu_tuple_cost = 0.01 # cost of processing each row during a query
#cpu_index_tuple_cost = 0.005 # cost of processing each index entry
#cpu_operator_cost = 0.0025 # cost of processing each operator or function
so basically cost is just
cini . How hard could it be?
12 Well, kind of hard
• How many rows we’ll get when we’ll filter table by this condition?
• How many pages is that? Will we read them sequentially or not?
• How many rows we’ll get when we join 2 relations?
13 We have stats!
• pg_statistic – only readable by a superuser
• pg_stats view – the same but human-readable and available to all users
(permissions apply)
14 pg_stats
pgday=# d pg_stats
Column | Type |
tablename | name | name of the table or functional index
attname | name | name of the column or index column
null_frac | real | fraction of column entries that are null
avg_width | integer | average width in bytes of column’s entries
n_distinct | real | number (or fraction of number of rows) of distinct values
most_common_vals | anyarray | list of the most common values in the column
most_common_freqs | real[] | list of the frequencies of the most common values
histogram_bounds | anyarray | list of intervals with approximately equal population
correlation | real | correlation between physical row ordering and logical ordering
most_common_elems | anyarray |
most_common_elem_freqs | real[] |
elem_count_histogram | real[] |
15 Analyze
table pages (8Kb)
pick 300*stats_target random pages pick 300*stats_target random rows
Algorithm Z from Vitter, Jeffrey S. (1 March 1985). “Random sampling with a reservoir”
16 Analyze
not nulls
MCV list
column values
pcutoff = MIN
17 autoanalyze
• inserted + updated + deleted > threshold ⇒ run autoanalyze
• threshold = autovacuum_analyze_threshold +
• autovacuum_analyze_scale_factor (default = 0.1)
• autovacuum_analyze_threshold (default = 50)
• default_statistics_target (default = 100)
• rows in sample = 300 * stats_target
18 n_distinct underestimation example
select setseed(0.5);
create table test_ndistinct as
(case when random() < 0.1 then f1 end)::int f1
from normal_rand(10000000, 50000, 50000/3) as nr(f1);
10M rows, 90% nulls, ≈ 99.7% of values in between 0..100000
19 n_distinct underestimation example
# analyze verbose test_ndistinct;
INFO: analyzing "public.test_ndistinct"
INFO: "test_ndistinct": scanned 30000 of 35314 pages, containing 8495268 live rows and 0 dead rows;
30000 rows in sample, 10000067 estimated total rows
select * from pg_stats where tablename = ’test_ndistinct’ and attname = ’f1’;
null_frac | 0.904067
avg_width | 4
n_distinct | 3080
most_common_vals |
most_common_freqs |
histogram_bounds | {-8505,10072,15513,18933,21260,22574,24082,25695,26953,27898,28645...
correlation | -0.00286606
20 n_distinct underestimation example
# explain analyze select distinct f1 from test_ndistinct ;
HashAggregate (cost=160314.84..160345.64 rows=3080 width=4)
(actual time=2558.751..2581.286 rows=90020 loops=1)
Group Key: f1
-> Seq Scan on test_ndistinct (cost=0.00..135314.67 rows=10000067 width=4)
(actual time=0.045..931.687 rows=10000000 loops=1)
Planning time: 0.048 ms
Execution time: 2586.550 ms
21 n_distinct underestimation example
# set default_statistics_target = 50;
# analyze verbose test_ndistinct;
INFO: analyzing "public.test_ndistinct"
INFO: "test_ndistinct": scanned 15000 of 35314 pages, containing 4247361 live rows and 0 dead rows;
15000 rows in sample, 9999792 estimated total rows
# explain analyze select distinct f1 from test_ndistinct ;
HashAggregate (cost=160311.40..160328.51 rows=1711 width=4)
(actual time=2436.392..2455.851 rows=90020 loops=1)
Group Key: f1
-> Seq Scan on test_ndistinct (cost=0.00..135311.92 rows=9999792 width=4)
(actual time=0.029..892.596 rows=10000000 loops=1)
Planning time: 0.096 ms
Execution time: 2461.160 ms
22 n_distinct underestimation example
# explain analyze select * from test_ndistinct where f1 < 5000;
Seq Scan on test_ndistinct (cost=0.00..160316.36 rows=99 width=4)
(actual time=2.325..1436.792 rows=3480 loops=1)
Filter: (f1 < 5000)
Rows Removed by Filter: 9996520
Planning time: 0.058 ms
Execution time: 1437.424 ms
23 n_distinct underestimation example
alter table test_ndistinct alter column f1 set (n_distinct = 100000);
analyze verbose test_ndistinct;
INFO: analyzing "public.test_ndistinct"
INFO: "test_ndistinct": scanned 15000 of 35314 pages, containing 4247670 live rows and 0 dead rows;
15000 rows in sample, 10000012 estimated total rows
24 n_distinct underestimation example
# explain analyze select distinct f1 from test_ndistinct ;
Unique (cost=1571431.43..1621431.49 rows=100000 width=4)
(actual time=4791.872..7551.150 rows=90020 loops=1)
-> Sort (cost=1571431.43..1596431.46 rows=10000012 width=4)
(actual time=4791.870..6893.413 rows=10000000 loops=1)
Sort Key: f1
Sort Method: external merge Disk: 101648kB
-> Seq Scan on test_ndistinct (cost=0.00..135314.12 rows=10000012 width=4)
(actual time=0.041..938.093 rows=10000000 loops=1)
Planning time: 0.099 ms
Execution time: 7714.701 ms
25 n_distinct underestimation example
set work_mem = ’8MB’;
# explain analyze select distinct f1 from test_ndistinct ;
HashAggregate (cost=160314.15..161314.15 rows=100000 width=4)
(actual time=2371.902..2391.415 rows=90020 loops=1)
Group Key: f1
-> Seq Scan on test_ndistinct (cost=0.00..135314.12 rows=10000012 width=4)
(actual time=0.093..871.619 rows=10000000 loops=1)
Planning time: 0.048 ms
Execution time: 2396.186 ms
26 n_distinct underestimation example
# explain analyze select * from test_ndistinct where f1 < 5000;
Seq Scan on test_ndistinct (cost=0.00..160316.61 rows=7550 width=4)
(actual time=0.723..839.347 rows=3480 loops=1)
Filter: (f1 < 5000)
Rows Removed by Filter: 9996520
Planning time: 0.262 ms
Execution time: 839.774 ms
27 n_distinct
• n_distinct plays important role in rows estimation when values are not in
MCV list
• In very big tables it’s possible to underestimate it in some cases
• It’s possible to override n_distinct estimation via
alter table xx alter column yy set (n_distinct = zz);
28 default_statistics_target
• Increasing default_statistics_target setting in config could help in this case
but not recommended
• Default value 100 usually is good enough
• When it’s not enough better increase it on selected columns only via
alter table xx alter column yy set statistics zz;
• Otherwise it could lead to much longer planning time (autoanalyze will
work longer too)
29 high default_statistics_target real example
# show default_statistics_target ;
explain analyze SELECT "seven_charlie"."id" FROM "xray" JOIN "seven_charlie" ON
( "xray"."lima_seven" = "seven_charlie"."lima_seven" ) WHERE
( "xray"."alpha" = ’139505’ ) AND ( "seven_charlie"."seven_five" IS TRUE );
Nested Loop (cost=0.850..798.110 rows=58 width=4) (actual time=0.081..3.314 rows=169 loops=1)
-> Index Scan using romeo on xray (cost=0.420..205.390 rows=242 width=4) (actual time=0.023..0.79
Index Cond: (alpha = 139505)
-> Index Scan using lima_two on seven_charlie (cost=0.430..2.440 rows=1 width=8) (actual time=0.0
Index Cond: ((lima_seven = papa3six.lima_seven) AND (seven_five = true))
Planning time: 433.630 ms
Execution time: 3.397 ms
30 high default_statistics_target real example
set default_statistics_target = 1000;
# analyze verbose xray;
INFO: analyzing "public.xray"
INFO: "xray": scanned 6760 of 6760 pages, containing 851656 live rows and 2004 dead rows; 300000 row
Nested Loop (cost=0.850..782.110 rows=57 width=4) (actual time=0.066..2.992 rows=169 loops=1)
-> Index Scan using romeo on xray (cost=0.420..199.220 rows=238 width=4) (actual time=0.021..0.70
Index Cond: (alpha = 139505)
-> Index Scan using lima_two on seven_charlie (cost=0.430..2.440 rows=1 width=8) (actual time=0.0
Index Cond: ((lima_seven = papa3six.lima_seven) AND (seven_five = true))
Planning time: 75.196 ms
Execution time: 3.071 ms
31 high default_statistics_target real example
# analyze verbose seven_charlie;
INFO: "seven_charlie": scanned 300000 of 1548230 pages, containing 2184079 live rows and 8293 dead r
Nested Loop (cost=0.850..782.110 rows=65 width=4) (actual time=0.197..26.517 rows=169 loops=1)
-> Index Scan using romeo on xray (cost=0.420..199.220 rows=238 width=4) (actual time=0.064..3.15
Index Cond: (alpha = 139505)
-> Index Scan using lima_two on seven_charlie (cost=0.430..2.440 rows=1 width=8) (actual time=0.0
Index Cond: ((lima_seven = papa3six.lima_seven) AND (seven_five = true))
Planning time: 14.256 ms
Execution time: 26.617 ms
32 Hacks
select name from pg_settings where name ~ ‘enable_’;
startup_cost += disable_cost
disable_cost = 1010
33 Hacks
• Very good for testing
• Affects whole query
• Possible to use in functions in some bad cases via
alter function xxx() set enable_? = false
• pg_hint_plan6 extension (not in contrib) which provide hints
34 Join ordering problem
• There are O(n!) ways to join n relations which grows very fast (10! ≈ 3.6M)
• ORMs like to join everything
• It’s possible to break this number down by using CTEs but be careful
• join_collapse_limit, from_collapse_limit (default 8 relations)
35 Genetic Query Optimizer (geqo)
• geqo_threshold (default 12 relations)
• Chose suboptimal plan in reasonable time
• “Mutation” and selection phases
36 Genetic algorithm fun demo
37 What planner can’t do properly
• Estimate number of rows correctly for conditions like “a=x and b=y” where
a and b statistically dependent
• Use indexes for conditions like created_at + interval ‘1 day’ >= NOW()
• Use index to count distinct values8
• Cope with lots of partitions
• Estimate correctly how many rows need to be read when using index scan
on a for “where condition order by a limit n”
38 Query rewriting tricks
• Disable index usage in where clause: a = x => a+0 = x
• Disable index usage in order by clause: order by a => order by a+0
• Restrict push up/pull downs from subquery with offset 0
• Replace left join with exists/not exists to force nested loop
• Move non-limiting join after limit
39 What have we learned?
custom per-column
custom per-table
custom per-tablespace
query rewriting tricks
or session/per database/per user settings
40 Troubleshooting
• Don’t panic!
• Check if planner’s estimates are wrong (off by orders of magnitude)
• Check for missing indexes when a lot of filtering is done
• For complex plans could help
• Extract problem part
• Check for outdated/incomplete stats
• Play with hacks and query rewriting tricks
41 Would you like to know more?
• Robert Haas – The PostgreSQL Query Planner, PostgreSQL East 2010
• Tom Lane – Hacking the Query Planner, PGCon 2011
• Bruce Momjian – Explaining the Postgres Query Optimizer
• PostgreSQL Manual 67.1. Row Estimation Examples
• PostgreSQL Manual 14.1. Using EXPLAIN
• depesz: Implement multivariate n-distinct coefficients
• depesz: Explaining the unexplainable
42 Questions?

More Related Content

What's hot

PostgreSQL Performance Tuning
PostgreSQL Performance TuningPostgreSQL Performance Tuning
PostgreSQL Performance Tuning
elliando dias
How to use Parquet as a basis for ETL and analytics
How to use Parquet as a basis for ETL and analyticsHow to use Parquet as a basis for ETL and analytics
How to use Parquet as a basis for ETL and analytics
Julien Le Dem
How a Developer can Troubleshoot a SQL performing poorly on a Production DB
How a Developer can Troubleshoot a SQL performing poorly on a Production DBHow a Developer can Troubleshoot a SQL performing poorly on a Production DB
How a Developer can Troubleshoot a SQL performing poorly on a Production DB
Carlos Sierra
Query Optimization with MySQL 5.6: Old and New Tricks - Percona Live London 2013
Query Optimization with MySQL 5.6: Old and New Tricks - Percona Live London 2013Query Optimization with MySQL 5.6: Old and New Tricks - Percona Live London 2013
Query Optimization with MySQL 5.6: Old and New Tricks - Percona Live London 2013
Jaime Crespo
The columnar roadmap: Apache Parquet and Apache Arrow
The columnar roadmap: Apache Parquet and Apache ArrowThe columnar roadmap: Apache Parquet and Apache Arrow
The columnar roadmap: Apache Parquet and Apache Arrow
DataWorks Summit
File Format Benchmark - Avro, JSON, ORC and Parquet
File Format Benchmark - Avro, JSON, ORC and ParquetFile Format Benchmark - Avro, JSON, ORC and Parquet
File Format Benchmark - Avro, JSON, ORC and Parquet
DataWorks Summit/Hadoop Summit
Deep dive into PostgreSQL statistics.
Deep dive into PostgreSQL statistics.Deep dive into PostgreSQL statistics.
Deep dive into PostgreSQL statistics.
Alexey Lesovsky
Advanced MySQL Query Tuning
Advanced MySQL Query TuningAdvanced MySQL Query Tuning
Advanced MySQL Query Tuning
Alexander Rubin
Apache phoenix: Past, Present and Future of SQL over HBAse
Apache phoenix: Past, Present and Future of SQL over HBAseApache phoenix: Past, Present and Future of SQL over HBAse
Apache phoenix: Past, Present and Future of SQL over HBAse
Types of normalization
Types of normalizationTypes of normalization
Types of normalization
RocksDB compaction
RocksDB compactionRocksDB compaction
RocksDB compaction
Oracle database performance tuning
Oracle database performance tuningOracle database performance tuning
Oracle database performance tuning
Yogiji Creations
Apache Arrow Flight: A New Gold Standard for Data Transport
Apache Arrow Flight: A New Gold Standard for Data TransportApache Arrow Flight: A New Gold Standard for Data Transport
Apache Arrow Flight: A New Gold Standard for Data Transport
Wes McKinney
How to use histograms to get better performance
How to use histograms to get better performanceHow to use histograms to get better performance
How to use histograms to get better performance
MariaDB plc
Oracle Performance Tools of the Trade
Oracle Performance Tools of the TradeOracle Performance Tools of the Trade
Oracle Performance Tools of the Trade
Carlos Sierra
Parquet - Data I/O - Philadelphia 2013
Parquet - Data I/O - Philadelphia 2013Parquet - Data I/O - Philadelphia 2013
Parquet - Data I/O - Philadelphia 2013
Hive + Tez: A Performance Deep Dive
Hive + Tez: A Performance Deep DiveHive + Tez: A Performance Deep Dive
Hive + Tez: A Performance Deep Dive
DataWorks Summit
A Practical Introduction to Handling Log Data in ClickHouse, by Robert Hodges...
A Practical Introduction to Handling Log Data in ClickHouse, by Robert Hodges...A Practical Introduction to Handling Log Data in ClickHouse, by Robert Hodges...
A Practical Introduction to Handling Log Data in ClickHouse, by Robert Hodges...
Altinity Ltd
Structuring Spark: DataFrames, Datasets, and Streaming by Michael Armbrust
Structuring Spark: DataFrames, Datasets, and Streaming by Michael ArmbrustStructuring Spark: DataFrames, Datasets, and Streaming by Michael Armbrust
Structuring Spark: DataFrames, Datasets, and Streaming by Michael Armbrust
Spark Summit
InfluxDB IOx Tech Talks: Query Engine Design and the Rust-Based DataFusion in...
InfluxDB IOx Tech Talks: Query Engine Design and the Rust-Based DataFusion in...InfluxDB IOx Tech Talks: Query Engine Design and the Rust-Based DataFusion in...
InfluxDB IOx Tech Talks: Query Engine Design and the Rust-Based DataFusion in...

What's hot (20)

PostgreSQL Performance Tuning
PostgreSQL Performance TuningPostgreSQL Performance Tuning
PostgreSQL Performance Tuning
How to use Parquet as a basis for ETL and analytics
How to use Parquet as a basis for ETL and analyticsHow to use Parquet as a basis for ETL and analytics
How to use Parquet as a basis for ETL and analytics
How a Developer can Troubleshoot a SQL performing poorly on a Production DB
How a Developer can Troubleshoot a SQL performing poorly on a Production DBHow a Developer can Troubleshoot a SQL performing poorly on a Production DB
How a Developer can Troubleshoot a SQL performing poorly on a Production DB
Query Optimization with MySQL 5.6: Old and New Tricks - Percona Live London 2013
Query Optimization with MySQL 5.6: Old and New Tricks - Percona Live London 2013Query Optimization with MySQL 5.6: Old and New Tricks - Percona Live London 2013
Query Optimization with MySQL 5.6: Old and New Tricks - Percona Live London 2013
The columnar roadmap: Apache Parquet and Apache Arrow
The columnar roadmap: Apache Parquet and Apache ArrowThe columnar roadmap: Apache Parquet and Apache Arrow
The columnar roadmap: Apache Parquet and Apache Arrow
File Format Benchmark - Avro, JSON, ORC and Parquet
File Format Benchmark - Avro, JSON, ORC and ParquetFile Format Benchmark - Avro, JSON, ORC and Parquet
File Format Benchmark - Avro, JSON, ORC and Parquet
Deep dive into PostgreSQL statistics.
Deep dive into PostgreSQL statistics.Deep dive into PostgreSQL statistics.
Deep dive into PostgreSQL statistics.
Advanced MySQL Query Tuning
Advanced MySQL Query TuningAdvanced MySQL Query Tuning
Advanced MySQL Query Tuning
Apache phoenix: Past, Present and Future of SQL over HBAse
Apache phoenix: Past, Present and Future of SQL over HBAseApache phoenix: Past, Present and Future of SQL over HBAse
Apache phoenix: Past, Present and Future of SQL over HBAse
Types of normalization
Types of normalizationTypes of normalization
Types of normalization
RocksDB compaction
RocksDB compactionRocksDB compaction
RocksDB compaction
Oracle database performance tuning
Oracle database performance tuningOracle database performance tuning
Oracle database performance tuning
Apache Arrow Flight: A New Gold Standard for Data Transport
Apache Arrow Flight: A New Gold Standard for Data TransportApache Arrow Flight: A New Gold Standard for Data Transport
Apache Arrow Flight: A New Gold Standard for Data Transport
How to use histograms to get better performance
How to use histograms to get better performanceHow to use histograms to get better performance
How to use histograms to get better performance
Oracle Performance Tools of the Trade
Oracle Performance Tools of the TradeOracle Performance Tools of the Trade
Oracle Performance Tools of the Trade
Parquet - Data I/O - Philadelphia 2013
Parquet - Data I/O - Philadelphia 2013Parquet - Data I/O - Philadelphia 2013
Parquet - Data I/O - Philadelphia 2013
Hive + Tez: A Performance Deep Dive
Hive + Tez: A Performance Deep DiveHive + Tez: A Performance Deep Dive
Hive + Tez: A Performance Deep Dive
A Practical Introduction to Handling Log Data in ClickHouse, by Robert Hodges...
A Practical Introduction to Handling Log Data in ClickHouse, by Robert Hodges...A Practical Introduction to Handling Log Data in ClickHouse, by Robert Hodges...
A Practical Introduction to Handling Log Data in ClickHouse, by Robert Hodges...
Structuring Spark: DataFrames, Datasets, and Streaming by Michael Armbrust
Structuring Spark: DataFrames, Datasets, and Streaming by Michael ArmbrustStructuring Spark: DataFrames, Datasets, and Streaming by Michael Armbrust
Structuring Spark: DataFrames, Datasets, and Streaming by Michael Armbrust
InfluxDB IOx Tech Talks: Query Engine Design and the Rust-Based DataFusion in...
InfluxDB IOx Tech Talks: Query Engine Design and the Rust-Based DataFusion in...InfluxDB IOx Tech Talks: Query Engine Design and the Rust-Based DataFusion in...
InfluxDB IOx Tech Talks: Query Engine Design and the Rust-Based DataFusion in...

Similar to PostgreSQL query planner's internals

MySQL performance tuning
MySQL performance tuningMySQL performance tuning
MySQL performance tuning
Anurag Srivastava
Query Optimization with MySQL 5.6: Old and New Tricks
Query Optimization with MySQL 5.6: Old and New TricksQuery Optimization with MySQL 5.6: Old and New Tricks
Query Optimization with MySQL 5.6: Old and New Tricks
POLARDB for MySQL - Parallel Query
POLARDB for MySQL - Parallel QueryPOLARDB for MySQL - Parallel Query
POLARDB for MySQL - Parallel Query
Advanced pg_stat_statements: Filtering, Regression Testing & more
Advanced pg_stat_statements: Filtering, Regression Testing & moreAdvanced pg_stat_statements: Filtering, Regression Testing & more
Advanced pg_stat_statements: Filtering, Regression Testing & more
Lukas Fittl
Introduction to MySQL Query Tuning for Dev[Op]s
Introduction to MySQL Query Tuning for Dev[Op]sIntroduction to MySQL Query Tuning for Dev[Op]s
Introduction to MySQL Query Tuning for Dev[Op]s
Sveta Smirnova
In Memory Database In Action by Tanel Poder and Kerry Osborne
In Memory Database In Action by Tanel Poder and Kerry OsborneIn Memory Database In Action by Tanel Poder and Kerry Osborne
In Memory Database In Action by Tanel Poder and Kerry Osborne
Oracle Database In-Memory Option in Action
Oracle Database In-Memory Option in ActionOracle Database In-Memory Option in Action
Oracle Database In-Memory Option in Action
Tanel Poder
Quick Wins
Quick WinsQuick Wins
Quick Wins
Managing Statistics for Optimal Query Performance
Managing Statistics for Optimal Query PerformanceManaging Statistics for Optimal Query Performance
Managing Statistics for Optimal Query Performance
Karen Morton
Shaping Optimizer's Search Space
Shaping Optimizer's Search SpaceShaping Optimizer's Search Space
Shaping Optimizer's Search Space
SQL: Query optimization in practice
SQL: Query optimization in practiceSQL: Query optimization in practice
SQL: Query optimization in practice
Jano Suchal
Lessons for the optimizer from running the TPC-DS benchmark
Lessons for the optimizer from running the TPC-DS benchmarkLessons for the optimizer from running the TPC-DS benchmark
Lessons for the optimizer from running the TPC-DS benchmark
Sergey Petrunya
Basic Query Tuning Primer
Basic Query Tuning PrimerBasic Query Tuning Primer
Basic Query Tuning Primer
Command Prompt., Inc
Basic Query Tuning Primer - Pg West 2009
Basic Query Tuning Primer - Pg West 2009Basic Query Tuning Primer - Pg West 2009
Basic Query Tuning Primer - Pg West 2009
John Melesky - Federating Queries Using Postgres FDW @ Postgres Open
John Melesky - Federating Queries Using Postgres FDW @ Postgres OpenJohn Melesky - Federating Queries Using Postgres FDW @ Postgres Open
John Melesky - Federating Queries Using Postgres FDW @ Postgres Open
Sveta Smirnova
Oracle Database Performance Tuning Basics
Oracle Database Performance Tuning BasicsOracle Database Performance Tuning Basics
Oracle Database Performance Tuning Basics
nitin anjankar
Introduction into MySQL Query Tuning
Introduction into MySQL Query TuningIntroduction into MySQL Query Tuning
Introduction into MySQL Query Tuning
Sveta Smirnova
Performance Schema for MySQL Troubleshooting
Performance Schema for MySQL TroubleshootingPerformance Schema for MySQL Troubleshooting
Performance Schema for MySQL Troubleshooting
Sveta Smirnova

Similar to PostgreSQL query planner's internals (20)

MySQL performance tuning
MySQL performance tuningMySQL performance tuning
MySQL performance tuning
Query Optimization with MySQL 5.6: Old and New Tricks
Query Optimization with MySQL 5.6: Old and New TricksQuery Optimization with MySQL 5.6: Old and New Tricks
Query Optimization with MySQL 5.6: Old and New Tricks
POLARDB for MySQL - Parallel Query
POLARDB for MySQL - Parallel QueryPOLARDB for MySQL - Parallel Query
POLARDB for MySQL - Parallel Query
Advanced pg_stat_statements: Filtering, Regression Testing & more
Advanced pg_stat_statements: Filtering, Regression Testing & moreAdvanced pg_stat_statements: Filtering, Regression Testing & more
Advanced pg_stat_statements: Filtering, Regression Testing & more
Introduction to MySQL Query Tuning for Dev[Op]s
Introduction to MySQL Query Tuning for Dev[Op]sIntroduction to MySQL Query Tuning for Dev[Op]s
Introduction to MySQL Query Tuning for Dev[Op]s
In Memory Database In Action by Tanel Poder and Kerry Osborne
In Memory Database In Action by Tanel Poder and Kerry OsborneIn Memory Database In Action by Tanel Poder and Kerry Osborne
In Memory Database In Action by Tanel Poder and Kerry Osborne
Oracle Database In-Memory Option in Action
Oracle Database In-Memory Option in ActionOracle Database In-Memory Option in Action
Oracle Database In-Memory Option in Action
Quick Wins
Quick WinsQuick Wins
Quick Wins
Managing Statistics for Optimal Query Performance
Managing Statistics for Optimal Query PerformanceManaging Statistics for Optimal Query Performance
Managing Statistics for Optimal Query Performance
Shaping Optimizer's Search Space
Shaping Optimizer's Search SpaceShaping Optimizer's Search Space
Shaping Optimizer's Search Space
SQL: Query optimization in practice
SQL: Query optimization in practiceSQL: Query optimization in practice
SQL: Query optimization in practice
Lessons for the optimizer from running the TPC-DS benchmark
Lessons for the optimizer from running the TPC-DS benchmarkLessons for the optimizer from running the TPC-DS benchmark
Lessons for the optimizer from running the TPC-DS benchmark
Basic Query Tuning Primer
Basic Query Tuning PrimerBasic Query Tuning Primer
Basic Query Tuning Primer
Basic Query Tuning Primer - Pg West 2009
Basic Query Tuning Primer - Pg West 2009Basic Query Tuning Primer - Pg West 2009
Basic Query Tuning Primer - Pg West 2009
John Melesky - Federating Queries Using Postgres FDW @ Postgres Open
John Melesky - Federating Queries Using Postgres FDW @ Postgres OpenJohn Melesky - Federating Queries Using Postgres FDW @ Postgres Open
John Melesky - Federating Queries Using Postgres FDW @ Postgres Open
Oracle Database Performance Tuning Basics
Oracle Database Performance Tuning BasicsOracle Database Performance Tuning Basics
Oracle Database Performance Tuning Basics
Introduction into MySQL Query Tuning
Introduction into MySQL Query TuningIntroduction into MySQL Query Tuning
Introduction into MySQL Query Tuning
Performance Schema for MySQL Troubleshooting
Performance Schema for MySQL TroubleshootingPerformance Schema for MySQL Troubleshooting
Performance Schema for MySQL Troubleshooting

Recently uploaded

FIDO Munich Seminar: Biometrics and Passkeys for In-Vehicle Apps.pptx
FIDO Munich Seminar: Biometrics and Passkeys for In-Vehicle Apps.pptxFIDO Munich Seminar: Biometrics and Passkeys for In-Vehicle Apps.pptx
FIDO Munich Seminar: Biometrics and Passkeys for In-Vehicle Apps.pptx
FIDO Alliance
Exchange, Entra ID, Conectores, RAML: Todo, a la vez, en todas partes
Exchange, Entra ID, Conectores, RAML: Todo, a la vez, en todas partesExchange, Entra ID, Conectores, RAML: Todo, a la vez, en todas partes
Exchange, Entra ID, Conectores, RAML: Todo, a la vez, en todas partes
Welcome to Cyberbiosecurity. Because regular cybersecurity wasn't complicated...
Welcome to Cyberbiosecurity. Because regular cybersecurity wasn't complicated...Welcome to Cyberbiosecurity. Because regular cybersecurity wasn't complicated...
Welcome to Cyberbiosecurity. Because regular cybersecurity wasn't complicated...
Snarky Security
Scaling Vector Search: How Milvus Handles Billions+
Scaling Vector Search: How Milvus Handles Billions+Scaling Vector Search: How Milvus Handles Billions+
Scaling Vector Search: How Milvus Handles Billions+
Perth MuleSoft Meetup July 2024
Perth MuleSoft Meetup July 2024Perth MuleSoft Meetup July 2024
Perth MuleSoft Meetup July 2024
Michael Price
Redefining Cybersecurity with AI Capabilities
Redefining Cybersecurity with AI CapabilitiesRedefining Cybersecurity with AI Capabilities
Redefining Cybersecurity with AI Capabilities
Priyanka Aash
TrustArc Webinar - Innovating with TRUSTe Responsible AI Certification
TrustArc Webinar - Innovating with TRUSTe Responsible AI CertificationTrustArc Webinar - Innovating with TRUSTe Responsible AI Certification
TrustArc Webinar - Innovating with TRUSTe Responsible AI Certification
AMD Zen 5 Architecture Deep Dive from Tech Day
AMD Zen 5 Architecture Deep Dive from Tech DayAMD Zen 5 Architecture Deep Dive from Tech Day
AMD Zen 5 Architecture Deep Dive from Tech Day
Low Hong Chuan
Keynote : Presentation on SASE Technology
Keynote : Presentation on SASE TechnologyKeynote : Presentation on SASE Technology
Keynote : Presentation on SASE Technology
Priyanka Aash
FIDO Munich Seminar Workforce Authentication Case Study.pptx
FIDO Munich Seminar Workforce Authentication Case Study.pptxFIDO Munich Seminar Workforce Authentication Case Study.pptx
FIDO Munich Seminar Workforce Authentication Case Study.pptx
FIDO Alliance
What's New in Copilot for Microsoft 365 June 2024.pptx
What's New in Copilot for Microsoft 365 June 2024.pptxWhat's New in Copilot for Microsoft 365 June 2024.pptx
What's New in Copilot for Microsoft 365 June 2024.pptx
Stephanie Beckett
Garbage In, Garbage Out: Why poor data curation is killing your AI models (an...
Garbage In, Garbage Out: Why poor data curation is killing your AI models (an...Garbage In, Garbage Out: Why poor data curation is killing your AI models (an...
Garbage In, Garbage Out: Why poor data curation is killing your AI models (an...
Increase Quality with User Access Policies - July 2024
Increase Quality with User Access Policies - July 2024Increase Quality with User Access Policies - July 2024
Increase Quality with User Access Policies - July 2024
Peter Caitens
FIDO Munich Seminar Introduction to FIDO.pptx
FIDO Munich Seminar Introduction to FIDO.pptxFIDO Munich Seminar Introduction to FIDO.pptx
FIDO Munich Seminar Introduction to FIDO.pptx
FIDO Alliance
The Path to General-Purpose Robots - Coatue
The Path to General-Purpose Robots - CoatueThe Path to General-Purpose Robots - Coatue
The Path to General-Purpose Robots - Coatue
Razin Mustafiz
Discovery Series - Zero to Hero - Task Mining Session 1
Discovery Series - Zero to Hero - Task Mining Session 1Discovery Series - Zero to Hero - Task Mining Session 1
Discovery Series - Zero to Hero - Task Mining Session 1
The History of Embeddings & Multimodal Embeddings
The History of Embeddings & Multimodal EmbeddingsThe History of Embeddings & Multimodal Embeddings
The History of Embeddings & Multimodal Embeddings
Choosing the Best Outlook OST to PST Converter: Key Features and Considerations
Choosing the Best Outlook OST to PST Converter: Key Features and ConsiderationsChoosing the Best Outlook OST to PST Converter: Key Features and Considerations
Choosing the Best Outlook OST to PST Converter: Key Features and Considerations
webbyacad software
Camunda Chapter NY Meetup July 2024.pptx
Camunda Chapter NY Meetup July 2024.pptxCamunda Chapter NY Meetup July 2024.pptx
Camunda Chapter NY Meetup July 2024.pptx
FIDO Munich Seminar: Strong Workforce Authn Push & Pull Factors.pptx
FIDO Munich Seminar: Strong Workforce Authn Push & Pull Factors.pptxFIDO Munich Seminar: Strong Workforce Authn Push & Pull Factors.pptx
FIDO Munich Seminar: Strong Workforce Authn Push & Pull Factors.pptx
FIDO Alliance

Recently uploaded (20)

FIDO Munich Seminar: Biometrics and Passkeys for In-Vehicle Apps.pptx
FIDO Munich Seminar: Biometrics and Passkeys for In-Vehicle Apps.pptxFIDO Munich Seminar: Biometrics and Passkeys for In-Vehicle Apps.pptx
FIDO Munich Seminar: Biometrics and Passkeys for In-Vehicle Apps.pptx
Exchange, Entra ID, Conectores, RAML: Todo, a la vez, en todas partes
Exchange, Entra ID, Conectores, RAML: Todo, a la vez, en todas partesExchange, Entra ID, Conectores, RAML: Todo, a la vez, en todas partes
Exchange, Entra ID, Conectores, RAML: Todo, a la vez, en todas partes
Welcome to Cyberbiosecurity. Because regular cybersecurity wasn't complicated...
Welcome to Cyberbiosecurity. Because regular cybersecurity wasn't complicated...Welcome to Cyberbiosecurity. Because regular cybersecurity wasn't complicated...
Welcome to Cyberbiosecurity. Because regular cybersecurity wasn't complicated...
Scaling Vector Search: How Milvus Handles Billions+
Scaling Vector Search: How Milvus Handles Billions+Scaling Vector Search: How Milvus Handles Billions+
Scaling Vector Search: How Milvus Handles Billions+
Perth MuleSoft Meetup July 2024
Perth MuleSoft Meetup July 2024Perth MuleSoft Meetup July 2024
Perth MuleSoft Meetup July 2024
Redefining Cybersecurity with AI Capabilities
Redefining Cybersecurity with AI CapabilitiesRedefining Cybersecurity with AI Capabilities
Redefining Cybersecurity with AI Capabilities
TrustArc Webinar - Innovating with TRUSTe Responsible AI Certification
TrustArc Webinar - Innovating with TRUSTe Responsible AI CertificationTrustArc Webinar - Innovating with TRUSTe Responsible AI Certification
TrustArc Webinar - Innovating with TRUSTe Responsible AI Certification
AMD Zen 5 Architecture Deep Dive from Tech Day
AMD Zen 5 Architecture Deep Dive from Tech DayAMD Zen 5 Architecture Deep Dive from Tech Day
AMD Zen 5 Architecture Deep Dive from Tech Day
Keynote : Presentation on SASE Technology
Keynote : Presentation on SASE TechnologyKeynote : Presentation on SASE Technology
Keynote : Presentation on SASE Technology
FIDO Munich Seminar Workforce Authentication Case Study.pptx
FIDO Munich Seminar Workforce Authentication Case Study.pptxFIDO Munich Seminar Workforce Authentication Case Study.pptx
FIDO Munich Seminar Workforce Authentication Case Study.pptx
What's New in Copilot for Microsoft 365 June 2024.pptx
What's New in Copilot for Microsoft 365 June 2024.pptxWhat's New in Copilot for Microsoft 365 June 2024.pptx
What's New in Copilot for Microsoft 365 June 2024.pptx
Garbage In, Garbage Out: Why poor data curation is killing your AI models (an...
Garbage In, Garbage Out: Why poor data curation is killing your AI models (an...Garbage In, Garbage Out: Why poor data curation is killing your AI models (an...
Garbage In, Garbage Out: Why poor data curation is killing your AI models (an...
Increase Quality with User Access Policies - July 2024
Increase Quality with User Access Policies - July 2024Increase Quality with User Access Policies - July 2024
Increase Quality with User Access Policies - July 2024
FIDO Munich Seminar Introduction to FIDO.pptx
FIDO Munich Seminar Introduction to FIDO.pptxFIDO Munich Seminar Introduction to FIDO.pptx
FIDO Munich Seminar Introduction to FIDO.pptx
The Path to General-Purpose Robots - Coatue
The Path to General-Purpose Robots - CoatueThe Path to General-Purpose Robots - Coatue
The Path to General-Purpose Robots - Coatue
Discovery Series - Zero to Hero - Task Mining Session 1
Discovery Series - Zero to Hero - Task Mining Session 1Discovery Series - Zero to Hero - Task Mining Session 1
Discovery Series - Zero to Hero - Task Mining Session 1
The History of Embeddings & Multimodal Embeddings
The History of Embeddings & Multimodal EmbeddingsThe History of Embeddings & Multimodal Embeddings
The History of Embeddings & Multimodal Embeddings
Choosing the Best Outlook OST to PST Converter: Key Features and Considerations
Choosing the Best Outlook OST to PST Converter: Key Features and ConsiderationsChoosing the Best Outlook OST to PST Converter: Key Features and Considerations
Choosing the Best Outlook OST to PST Converter: Key Features and Considerations
Camunda Chapter NY Meetup July 2024.pptx
Camunda Chapter NY Meetup July 2024.pptxCamunda Chapter NY Meetup July 2024.pptx
Camunda Chapter NY Meetup July 2024.pptx
FIDO Munich Seminar: Strong Workforce Authn Push & Pull Factors.pptx
FIDO Munich Seminar: Strong Workforce Authn Push & Pull Factors.pptxFIDO Munich Seminar: Strong Workforce Authn Push & Pull Factors.pptx
FIDO Munich Seminar: Strong Workforce Authn Push & Pull Factors.pptx

PostgreSQL query planner's internals

  • 1. PostgreSQL query planner’s internals How I Learned to Stop Worrying and Love the Planner Alexey Ermakov
  • 2. 2 Why this talk? • Why this query is so slow? • Why planner is not using my index? • What to do?
  • 3. 3 Where are we going? • How planner works • How we can affect it’s work • When it can go wrong • Known limitations
  • 4. 4 The Path of a Query Connection ↓ Parser ↓ Rewrite system ↓ Planner/Optimizer ↓ Executor ↔ [Workers] ↓ Send results all in single process (backend) beside background workers (parallel seq scan, 9.6+)
  • 5. 5 EXPLAIN command explain (ANALYZE,VERBOSE,COSTS,BUFFERS,TIMING1) select * from t1; QUERY PLAN ------------------------------------------------------------------- Seq Scan on public.t1 (cost=0.00..104424.80 rows=10000000 width=8) (actual time=0.218..2316.688 rows=10000000 loops=1) Output: f1, f2 Buffers: shared read=44248 I/O Timings: read=322.7142 Planning time: 0.024 ms Execution time: 3852.588 ms 1 COSTS and TIMING options are on by default 2 I/O Timings shown when track_io_timing is enabled
  • 6. 6 Planner have to guess Seq Scan on public.t1 (cost=0.00..104424.80 rows=10000000 width=8) • startup cost • total cost • rows • average row width
  • 7. 7 Cost stability principles Quote from “Common issues with planner statistics by Tomas Vondra”: 3 • correlation to query duration: The estimated cost is correlated with duration of the query, i.e. higher cost means longer execution. • estimation stability: A small difference in estimation causes only small difference in costs, i.e. small error in estimation causes only small cost differences. • cost stability: Small cost difference means small difference in duration. • cost comparability: For a given query, two plans with (almost) the same costs should result in (almost) the same duration. 3
  • 8. 8 Data retrieval methods • seq scan – sequential scan of whole table • index scan – random io (read index + read table) • index only scan – read only index (9.2+)4 • bitmap index scan – something in between seq scan/index scan, possible to use several indexes at same time in OR/AND conditions 4
  • 9. 9 Join methods • nested loop – optimal for small relations • hash join – optimal for big relations • merge join – optimal for big relations if they’re sorted
  • 10. 10 Aggregate methods • aggregate • hash aggregate • group aggregate
  • 11. 11 Planner Cost Constants #seq_page_cost = 1.0 # cost of a sequentially-fetched disk page #random_page_cost = 4.0 # cost of a non-sequentially-fetched disk page #cpu_tuple_cost = 0.01 # cost of processing each row during a query #cpu_index_tuple_cost = 0.005 # cost of processing each index entry #cpu_operator_cost = 0.0025 # cost of processing each operator or function so basically cost is just i cini . How hard could it be?
  • 12. 12 Well, kind of hard • How many rows we’ll get when we’ll filter table by this condition? • How many pages is that? Will we read them sequentially or not? • How many rows we’ll get when we join 2 relations?
  • 13. 13 We have stats! • pg_statistic – only readable by a superuser • pg_stats view – the same but human-readable and available to all users (permissions apply)
  • 14. 14 pg_stats pgday=# d pg_stats Column | Type | ------------------------+----------+---------------------------------------------------------------- tablename | name | name of the table or functional index attname | name | name of the column or index column null_frac | real | fraction of column entries that are null avg_width | integer | average width in bytes of column’s entries n_distinct | real | number (or fraction of number of rows) of distinct values most_common_vals | anyarray | list of the most common values in the column most_common_freqs | real[] | list of the frequencies of the most common values histogram_bounds | anyarray | list of intervals with approximately equal population correlation | real | correlation between physical row ordering and logical ordering most_common_elems | anyarray | most_common_elem_freqs | real[] | elem_count_histogram | real[] |
  • 15. 15 Analyze table pages (8Kb) pick 300*stats_target random pages pick 300*stats_target random rows rows 5 5 Algorithm Z from Vitter, Jeffrey S. (1 March 1985). “Random sampling with a reservoir”
  • 16. 16 Analyze sort nulls not nulls n_distinct -0.2 null_frac MCV list column values most_common_vals most_common_freqs histogram_bounds {1,3,6} {0.24,0.24,0.24} {2,5,8,10} pcutoff = MIN 1 stats_target 1.25pavg
  • 17. 17 autoanalyze • inserted + updated + deleted > threshold ⇒ run autoanalyze • threshold = autovacuum_analyze_threshold + reltuples*autovacuum_analyze_scale_factor • autovacuum_analyze_scale_factor (default = 0.1) • autovacuum_analyze_threshold (default = 50) • default_statistics_target (default = 100) • rows in sample = 300 * stats_target
  • 18. 18 n_distinct underestimation example select setseed(0.5); create table test_ndistinct as select (case when random() < 0.1 then f1 end)::int f1 from normal_rand(10000000, 50000, 50000/3) as nr(f1); 10M rows, 90% nulls, ≈ 99.7% of values in between 0..100000
  • 19. 19 n_distinct underestimation example # analyze verbose test_ndistinct; INFO: analyzing "public.test_ndistinct" INFO: "test_ndistinct": scanned 30000 of 35314 pages, containing 8495268 live rows and 0 dead rows; 30000 rows in sample, 10000067 estimated total rows select * from pg_stats where tablename = ’test_ndistinct’ and attname = ’f1’; ... null_frac | 0.904067 avg_width | 4 n_distinct | 3080 most_common_vals | most_common_freqs | histogram_bounds | {-8505,10072,15513,18933,21260,22574,24082,25695,26953,27898,28645... correlation | -0.00286606
  • 20. 20 n_distinct underestimation example # explain analyze select distinct f1 from test_ndistinct ; QUERY PLAN --------------------------------------------------------------------------------------- HashAggregate (cost=160314.84..160345.64 rows=3080 width=4) (actual time=2558.751..2581.286 rows=90020 loops=1) Group Key: f1 -> Seq Scan on test_ndistinct (cost=0.00..135314.67 rows=10000067 width=4) (actual time=0.045..931.687 rows=10000000 loops=1) Planning time: 0.048 ms Execution time: 2586.550 ms
  • 21. 21 n_distinct underestimation example # set default_statistics_target = 50; # analyze verbose test_ndistinct; INFO: analyzing "public.test_ndistinct" INFO: "test_ndistinct": scanned 15000 of 35314 pages, containing 4247361 live rows and 0 dead rows; 15000 rows in sample, 9999792 estimated total rows # explain analyze select distinct f1 from test_ndistinct ; QUERY PLAN --------------------------------------------------------------------------------------- HashAggregate (cost=160311.40..160328.51 rows=1711 width=4) (actual time=2436.392..2455.851 rows=90020 loops=1) Group Key: f1 -> Seq Scan on test_ndistinct (cost=0.00..135311.92 rows=9999792 width=4) (actual time=0.029..892.596 rows=10000000 loops=1) Planning time: 0.096 ms Execution time: 2461.160 ms
  • 22. 22 n_distinct underestimation example # explain analyze select * from test_ndistinct where f1 < 5000; QUERY PLAN --------------------------------------------------------------------------------------- Seq Scan on test_ndistinct (cost=0.00..160316.36 rows=99 width=4) (actual time=2.325..1436.792 rows=3480 loops=1) Filter: (f1 < 5000) Rows Removed by Filter: 9996520 Planning time: 0.058 ms Execution time: 1437.424 ms
  • 23. 23 n_distinct underestimation example alter table test_ndistinct alter column f1 set (n_distinct = 100000); analyze verbose test_ndistinct; INFO: analyzing "public.test_ndistinct" INFO: "test_ndistinct": scanned 15000 of 35314 pages, containing 4247670 live rows and 0 dead rows; 15000 rows in sample, 10000012 estimated total rows ANALYZE
  • 24. 24 n_distinct underestimation example # explain analyze select distinct f1 from test_ndistinct ; QUERY PLAN ------------------------------------------------------------------------------------------- Unique (cost=1571431.43..1621431.49 rows=100000 width=4) (actual time=4791.872..7551.150 rows=90020 loops=1) -> Sort (cost=1571431.43..1596431.46 rows=10000012 width=4) (actual time=4791.870..6893.413 rows=10000000 loops=1) Sort Key: f1 Sort Method: external merge Disk: 101648kB -> Seq Scan on test_ndistinct (cost=0.00..135314.12 rows=10000012 width=4) (actual time=0.041..938.093 rows=10000000 loops=1) Planning time: 0.099 ms Execution time: 7714.701 ms
  • 25. 25 n_distinct underestimation example set work_mem = ’8MB’; SET # explain analyze select distinct f1 from test_ndistinct ; QUERY PLAN ------------------------------------------------------------------------------------------- HashAggregate (cost=160314.15..161314.15 rows=100000 width=4) (actual time=2371.902..2391.415 rows=90020 loops=1) Group Key: f1 -> Seq Scan on test_ndistinct (cost=0.00..135314.12 rows=10000012 width=4) (actual time=0.093..871.619 rows=10000000 loops=1) Planning time: 0.048 ms Execution time: 2396.186 ms
  • 26. 26 n_distinct underestimation example # explain analyze select * from test_ndistinct where f1 < 5000; QUERY PLAN ------------------------------------------------------------------------------------------- Seq Scan on test_ndistinct (cost=0.00..160316.61 rows=7550 width=4) (actual time=0.723..839.347 rows=3480 loops=1) Filter: (f1 < 5000) Rows Removed by Filter: 9996520 Planning time: 0.262 ms Execution time: 839.774 ms
  • 27. 27 n_distinct • n_distinct plays important role in rows estimation when values are not in MCV list • In very big tables it’s possible to underestimate it in some cases • It’s possible to override n_distinct estimation via alter table xx alter column yy set (n_distinct = zz);
  • 28. 28 default_statistics_target • Increasing default_statistics_target setting in config could help in this case but not recommended • Default value 100 usually is good enough • When it’s not enough better increase it on selected columns only via alter table xx alter column yy set statistics zz; • Otherwise it could lead to much longer planning time (autoanalyze will work longer too)
  • 29. 29 high default_statistics_target real example # show default_statistics_target ; default_statistics_target --------------------------- 6000 explain analyze SELECT "seven_charlie"."id" FROM "xray" JOIN "seven_charlie" ON ( "xray"."lima_seven" = "seven_charlie"."lima_seven" ) WHERE ( "xray"."alpha" = ’139505’ ) AND ( "seven_charlie"."seven_five" IS TRUE ); Nested Loop (cost=0.850..798.110 rows=58 width=4) (actual time=0.081..3.314 rows=169 loops=1) -> Index Scan using romeo on xray (cost=0.420..205.390 rows=242 width=4) (actual time=0.023..0.79 Index Cond: (alpha = 139505) -> Index Scan using lima_two on seven_charlie (cost=0.430..2.440 rows=1 width=8) (actual time=0.0 Index Cond: ((lima_seven = papa3six.lima_seven) AND (seven_five = true)) Planning time: 433.630 ms Execution time: 3.397 ms
  • 30. 30 high default_statistics_target real example set default_statistics_target = 1000; SET # analyze verbose xray; INFO: analyzing "public.xray" INFO: "xray": scanned 6760 of 6760 pages, containing 851656 live rows and 2004 dead rows; 300000 row ANALYZE Nested Loop (cost=0.850..782.110 rows=57 width=4) (actual time=0.066..2.992 rows=169 loops=1) -> Index Scan using romeo on xray (cost=0.420..199.220 rows=238 width=4) (actual time=0.021..0.70 Index Cond: (alpha = 139505) -> Index Scan using lima_two on seven_charlie (cost=0.430..2.440 rows=1 width=8) (actual time=0.0 Index Cond: ((lima_seven = papa3six.lima_seven) AND (seven_five = true)) Planning time: 75.196 ms Execution time: 3.071 ms
  • 31. 31 high default_statistics_target real example # analyze verbose seven_charlie; INFO: "seven_charlie": scanned 300000 of 1548230 pages, containing 2184079 live rows and 8293 dead r Nested Loop (cost=0.850..782.110 rows=65 width=4) (actual time=0.197..26.517 rows=169 loops=1) -> Index Scan using romeo on xray (cost=0.420..199.220 rows=238 width=4) (actual time=0.064..3.15 Index Cond: (alpha = 139505) -> Index Scan using lima_two on seven_charlie (cost=0.430..2.440 rows=1 width=8) (actual time=0.0 Index Cond: ((lima_seven = papa3six.lima_seven) AND (seven_five = true)) Planning time: 14.256 ms Execution time: 26.617 ms
  • 32. 32 Hacks select name from pg_settings where name ~ ‘enable_’; enable_bitmapscan enable_indexscan enable_indexonlyscan enable_seqscan enable_tidscan enable_nestloop enable_hashjoin enable_mergejoin enable_sort enable_hashagg enable_material startup_cost += disable_cost disable_cost = 1010
  • 33. 33 Hacks • Very good for testing • Affects whole query • Possible to use in functions in some bad cases via alter function xxx() set enable_? = false • pg_hint_plan6 extension (not in contrib) which provide hints 6
  • 34. 34 Join ordering problem • There are O(n!) ways to join n relations which grows very fast (10! ≈ 3.6M) • ORMs like to join everything • It’s possible to break this number down by using CTEs but be careful • join_collapse_limit, from_collapse_limit (default 8 relations)
  • 35. 35 Genetic Query Optimizer (geqo) • geqo_threshold (default 12 relations) • Chose suboptimal plan in reasonable time • “Mutation” and selection phases
  • 36. 36 Genetic algorithm fun demo 7 7
  • 37. 37 What planner can’t do properly • Estimate number of rows correctly for conditions like “a=x and b=y” where a and b statistically dependent • Use indexes for conditions like created_at + interval ‘1 day’ >= NOW() • Use index to count distinct values8 • Cope with lots of partitions • Estimate correctly how many rows need to be read when using index scan on a for “where condition order by a limit n” 8
  • 38. 38 Query rewriting tricks • Disable index usage in where clause: a = x => a+0 = x • Disable index usage in order by clause: order by a => order by a+0 • Restrict push up/pull downs from subquery with offset 0 • Replace left join with exists/not exists to force nested loop • Move non-limiting join after limit
  • 39. 39 What have we learned? Planner query pg_statistic Analyze Plan config default_statistics_target autovacuum_analyze_scale_factor pg_classpg_attribute custom per-column stats_target n_distinct custom per-table autovacuum_analyze_scale_factor pg_tablespace costs custom per-tablespace seq_page_cost random_page_cost memory work_mem effective_cache_size hacks enable_* pg_index seq_page_cost random_page_cost cpu_tuple_cost cpu_index_tuple_cost cpu_operator_cost parallel parallel_setup_cost parallel_tuple_cost max_parallel_workers_per_gather query rewriting tricks other from_collapse_limit join_collapse_limit geqo* or session/per database/per user settings
  • 40. 40 Troubleshooting • Don’t panic! • Check if planner’s estimates are wrong (off by orders of magnitude) • Check for missing indexes when a lot of filtering is done • For complex plans could help • Extract problem part • Check for outdated/incomplete stats • Play with hacks and query rewriting tricks
  • 41. 41 Would you like to know more? • Robert Haas – The PostgreSQL Query Planner, PostgreSQL East 2010 • Tom Lane – Hacking the Query Planner, PGCon 2011 • Bruce Momjian – Explaining the Postgres Query Optimizer • PostgreSQL Manual 67.1. Row Estimation Examples • PostgreSQL Manual 14.1. Using EXPLAIN • depesz: Implement multivariate n-distinct coefficients • depesz: Explaining the unexplainable •