PostgreSQL 12 新特性解读之二| CTE支持Inlined With Queries
- 2019 年 11 月 21 日
- 筆記
作者介绍
谭峰,网名francs,《PostgreSQL实战》作者之一,《PostgreSQL 9 Administration Cookbook》译者之一,PostgreSQL 中文社区委员,致力于PostgreSQL技术分享,博客 https://postgres.fun
PostgreSQL 的 CTE( common table expressions ) 支持较复杂的查询,比如递归查询等场景, 12 版本之前 CTE 的 WITH 语句都是直接物化的,也就是说 WITH 语句执行一次并保持到一个类似的临时表中,供 WITH 语句外层的SQL引用,当 INSERT/UPDATE/DELETE 做CTE的 WITH 语句时是非常恰当的。
PostgreSQL 12 版本的一个重要特性是 CTE 支持 Inlined WITH queries,也就是说当 SELECT 做为CTE 的 WITH 语句时,支持将 WITH 语句中的查询条件下推到外层SQL中,从而提升 CTE 语句性能。
发行说明
Allow common table expressions (CTE) to be inlined in later parts of the query (Andreas Karlsson, Andrew Gierth, David Fetter, Tom Lane)
Specifically, CTEs are inlined if they are not recursive and are referenced only once later in the query. Inlining can be prevented by specifying MATERIALIZED, and forced by specifying NOT MATERIALIZED. Previously, CTEs were never inlined and were always evaluated before the rest of the query.
根据手册说明, CTE 的 Inlined WITH Queries 需要满足以下条件:
1. Are not recursive,非递归
2. Are referenced only once later in the query,外层查询仅调用一次
补丁说明
Tom Lane 大神提交的补丁,如下
commit: 608b167f9f9c4553c35bb1ec0eab9ddae643989b author: Tom Lane <[email protected]> date: Sat, 16 Feb 2019 16:11:12 -0500 Allow user control of CTE materialization, and change the default behavior. Historically we haveve always materialized the full output of a CTE query, treating WITH as an optimization fence (so that, for example, restrictions from the outer query cannot be pushed into it). This is appropriate when the CTE query is INSERT/UPDATE/DELETE, or is recursive; but when the CTE query is non-recursive and side-effect-free, there iss no hazard of changing the query results by pushing restrictions down. Another argument for materialization is that it can avoid duplicate computation of an expensive WITH query --- but that only applies if the WITH query is called more than once in the outer query. Even then it could still be a net loss, if each call has restrictions that would allow just a small part of the WITH query to be computed. Hence, let s change the behavior for WITH queries that are non-recursive and side-effect-free. By default, we will inline them into the outer query (removing the optimization fence) if they are called just once. If they are called more than once, we will keep the old behavior by default, but the user can override this and force inlining by specifying NOT MATERIALIZED. Lastly, the user can force the old behavior by specifying MATERIALIZED; this would mainly be useful when the query had deliberately been employing WITH as an optimization fence to prevent a poor choice of plan. Andreas Karlsson, Andrew Gierth, David Fetter Discussion: https://postgr.es/m/[email protected]
根据手册说明和补丁说明不一定能很好理解,下面通过实验演示。
环境准备
测试环境为1台8核16G虚拟机,创建测试表并插入 500 万数据,如下:
CREATE TABLE t (id int4, name text); INSERT INTO t(id,name) SELECT n, 'francs_' || n FROM generate_series(1,5000000) n; ALTER TABLE t ADD PRIMARY KEY (id);
构造一条 SELECT 做为 CTE 的 WITH 语句,如下:
WITH x AS ( SELECT * FROM t ) SELECT * FROM x where id=1;
PostgreSQL 11 测试
11 版本测试如下:
postgres=# EXPLAIN ANALYZE WITH x AS ( SELECT * FROM t ) SELECT * FROM x where id=1; QUERY PLAN -------------------------------------------------------------------------- CTE Scan on x (cost=81838.00..194338.00 rows=25000 width=36) (actual time=0.057..1557.140 rows=1 loops=1) Filter: (id = 1) Rows Removed by Filter: 4999999 CTE x -> Seq Scan on t (cost=0.00..81838.00 rows=5000000 width=36) (actual time=0.050..525.329 rows=5000000 loops=1 ) Planning Time: 0.128 ms Execution Time: 1591.979 ms (7 rows) Time: 1592.534 ms (00:01.593)
从执行计划看出分两步走,首先是在表 t 上做全表扫描,之后进行 CTE Scan 并过滤 id=1 的记录,以上并没有走索引,执行时间为 1592.534 ms,性能很低。
PostgreSQL 12 测试
12 版本测试如下:
postgres=# EXPLAIN ANALYZE WITH x AS ( SELECT * FROM t ) SELECT * FROM x where id=1; QUERY PLAN -------------------------------------------------------------------------- Index Scan using t_pkey on t (cost=0.43..8.45 rows=1 width=36) (actual time=0.021..0.022 rows=1 loops=1) Index Cond: (id = 1) Planning Time: 0.087 ms Execution Time: 0.043 ms (4 rows) Time: 0.469 ms
从执行计划看出,将 WITH 语句中的条件( id=1 ) 下推到了外层SQL中,这样直接走了 Index Scan,执行时间降为 0.469 ms,大辐提升了 CTE 语句性能。
支持用户控制 MATERIALIZED
12 版本用户可在 CTE 语句中控制是否使用物化(MATERIALIZED)。
以下 CTE 语句设置使用 MATERIALIZED,如下:
postgres=# EXPLAIN ANALYZE WITH x AS MATERIALIZED ( SELECT * FROM t ) SELECT * FROM x where id=1; QUERY PLAN -------------------------------------------------------------------------- CTE Scan on x (cost=81837.99..194337.97 rows=25000 width=36) (actual time=0.023..1607.012 rows=1 loops=1) Filter: (id = 1) Rows Removed by Filter: 4999999 CTE x -> Seq Scan on t (cost=0.00..81837.99 rows=4999999 width=18) (actual time=0.015..556.583 rows=5000000 loops=1) Planning Time: 0.245 ms Execution Time: 1640.071 ms (7 rows) Time: 1641.072 ms (00:01.641)
以下 CTE 语句设置不使用 MATERIALIZED,如下:
postgres=# EXPLAIN ANALYZE WITH x AS NOT MATERIALIZED ( SELECT * FROM t ) SELECT * FROM x where id=1; QUERY PLAN -------------------------------------------------------------------------- Index Scan using t_pkey on t (cost=0.43..8.45 rows=1 width=18) (actual time=0.072..0.074 rows=1 loops=1) Index Cond: (id = 1) Planning Time: 0.134 ms Execution Time: 0.099 ms (4 rows) Time: 0.801 ms
总结
PostgreSQL 12 版本的 CTE 支持 Inlined WITH Queries 特性,由于 WITH 查询语句的条件可以外推到外层查询,避免中间结果数据产生),同时使用相关索引,从而大辐提升 CTE 性能。
参考
1. [Postgres 12 highlight – WITH clause and materialization]
(https://paquier.xyz/postgresql-2/postgres-12-with-materialize/)
2. [PostgreSQL 12 preview – CTE 增强,支持用户语法层控制 materialized 优化]
(https://github.com/digoal/blog/blob/master/201903/20190309_04.md)
https://postgres.fun/20190807110800.html