PostgreSQL 12 新特性解讀之二| CTE支援Inlined With Queries

  • 2019 年 11 月 21 日
  • 筆記

作者介紹

譚峰,網名francs,《PostgreSQL實戰》作者之一,《PostgreSQL 9 Administration Cookbook》譯者之一,PostgreSQL 中文社區委員,致力於PostgreSQL技術分享,部落格 https://postgres.fun

PostgreSQL 的 CTE( common table expressions ) 支援較複雜的查詢,比如遞歸查詢等場景, 12 版本之前 CTE 的 WITH 語句都是直接物化的,也就是說 WITH 語句執行一次並保持到一個類似的臨時表中,供 WITH 語句外層的SQL引用,當 INSERT/UPDATE/DELETE 做CTE的 WITH 語句時是非常恰當的。

PostgreSQL 12 版本的一個重要特性是 CTE 支援 Inlined WITH queries,也就是說當 SELECT 做為CTE 的 WITH 語句時,支援將 WITH 語句中的查詢條件下推到外層SQL中,從而提升 CTE 語句性能。

發行說明

Allow common table expressions (CTE) to be inlined in later parts of the query (Andreas Karlsson, Andrew Gierth, David Fetter, Tom Lane)

Specifically, CTEs are inlined if they are not recursive and are referenced only once later in the query. Inlining can be prevented by specifying MATERIALIZED, and forced by specifying NOT MATERIALIZED. Previously, CTEs were never inlined and were always evaluated before the rest of the query.

根據手冊說明, CTE 的 Inlined WITH Queries 需要滿足以下條件:

1. Are not recursive,非遞歸

2. Are referenced only once later in the query,外層查詢僅調用一次

修補程式說明

Tom Lane 大神提交的修補程式,如下

commit: 608b167f9f9c4553c35bb1ec0eab9ddae643989b  author: Tom Lane <[email protected]>  date: Sat, 16 Feb 2019 16:11:12 -0500  Allow user control of CTE materialization, and change the default behavior.    Historically we haveve always materialized the full output of a CTE query,  treating WITH as an optimization fence (so that, for example, restrictions  from the outer query cannot be pushed into it).  This is appropriate when  the CTE query is INSERT/UPDATE/DELETE, or is recursive; but when the CTE  query is non-recursive and side-effect-free, there iss no hazard of changing  the query results by pushing restrictions down.    Another argument for materialization is that it can avoid duplicate  computation of an expensive WITH query --- but that only applies if  the WITH query is called more than once in the outer query.  Even then  it could still be a net loss, if each call has restrictions that  would allow just a small part of the WITH query to be computed.    Hence, let s change the behavior for WITH queries that are non-recursive  and side-effect-free.  By default, we will inline them into the outer  query (removing the optimization fence) if they are called just once.  If they are called more than once, we will keep the old behavior by  default, but the user can override this and force inlining by specifying  NOT MATERIALIZED.  Lastly, the user can force the old behavior by  specifying MATERIALIZED; this would mainly be useful when the query had  deliberately been employing WITH as an optimization fence to prevent a  poor choice of plan.    Andreas Karlsson, Andrew Gierth, David Fetter    Discussion: https://postgr.es/m/[email protected]

根據手冊說明和修補程式說明不一定能很好理解,下面通過實驗演示。

環境準備

測試環境為1台8核16G虛擬機,創建測試表並插入 500 萬數據,如下:

CREATE TABLE t (id int4, name text);    INSERT INTO t(id,name) SELECT n, 'francs_' || n  FROM generate_series(1,5000000) n;    ALTER TABLE t ADD PRIMARY KEY (id);

構造一條 SELECT 做為 CTE 的 WITH 語句,如下:

WITH x AS (  SELECT * FROM t  )  SELECT * FROM x where id=1;

PostgreSQL 11 測試

11 版本測試如下:

postgres=# EXPLAIN ANALYZE WITH x AS (  SELECT * FROM t  )  SELECT * FROM x where id=1;  QUERY PLAN    --------------------------------------------------------------------------  CTE Scan on x  (cost=81838.00..194338.00 rows=25000 width=36) (actual time=0.057..1557.140 rows=1 loops=1)  Filter: (id = 1)  Rows Removed by Filter: 4999999  CTE x  ->  Seq Scan on t  (cost=0.00..81838.00 rows=5000000 width=36) (actual time=0.050..525.329 rows=5000000 loops=1  )  Planning Time: 0.128 ms  Execution Time: 1591.979 ms  (7 rows)    Time: 1592.534 ms (00:01.593)

從執行計劃看出分兩步走,首先是在表 t 上做全表掃描,之後進行 CTE Scan 並過濾 id=1 的記錄,以上並沒有走索引,執行時間為 1592.534 ms,性能很低。

PostgreSQL 12 測試

12 版本測試如下:

postgres=# EXPLAIN ANALYZE WITH x AS (  SELECT * FROM t  )  SELECT * FROM x where id=1;  QUERY PLAN  --------------------------------------------------------------------------  Index Scan using t_pkey on t  (cost=0.43..8.45 rows=1 width=36) (actual time=0.021..0.022 rows=1 loops=1)  Index Cond: (id = 1)  Planning Time: 0.087 ms  Execution Time: 0.043 ms  (4 rows)    Time: 0.469 ms

從執行計劃看出,將 WITH 語句中的條件( id=1 ) 下推到了外層SQL中,這樣直接走了 Index Scan,執行時間降為 0.469 ms,大輻提升了 CTE 語句性能。

支援用戶控制 MATERIALIZED

12 版本用戶可在 CTE 語句中控制是否使用物化(MATERIALIZED)。

以下 CTE 語句設置使用 MATERIALIZED,如下:

postgres=# EXPLAIN ANALYZE WITH x AS MATERIALIZED (  SELECT * FROM t  )  SELECT * FROM x where id=1;  QUERY PLAN  --------------------------------------------------------------------------  CTE Scan on x  (cost=81837.99..194337.97 rows=25000 width=36) (actual time=0.023..1607.012 rows=1 loops=1)  Filter: (id = 1)  Rows Removed by Filter: 4999999  CTE x  ->  Seq Scan on t  (cost=0.00..81837.99 rows=4999999 width=18) (actual time=0.015..556.583 rows=5000000 loops=1)  Planning Time: 0.245 ms  Execution Time: 1640.071 ms  (7 rows)    Time: 1641.072 ms (00:01.641)

以下 CTE 語句設置不使用 MATERIALIZED,如下:

postgres=# EXPLAIN ANALYZE WITH x AS NOT MATERIALIZED (  SELECT * FROM t  )  SELECT * FROM x where id=1;  QUERY PLAN  --------------------------------------------------------------------------  Index Scan using t_pkey on t  (cost=0.43..8.45 rows=1 width=18) (actual time=0.072..0.074 rows=1 loops=1)  Index Cond: (id = 1)  Planning Time: 0.134 ms  Execution Time: 0.099 ms  (4 rows)    Time: 0.801 ms

總結

PostgreSQL 12 版本的 CTE 支援 Inlined WITH Queries 特性,由於 WITH 查詢語句的條件可以外推到外層查詢,避免中間結果數據產生),同時使用相關索引,從而大輻提升 CTE 性能。

參考

1. [Postgres 12 highlight – WITH clause and materialization]

(https://paquier.xyz/postgresql-2/postgres-12-with-materialize/)

2. [PostgreSQL 12 preview – CTE 增強,支援用戶語法層控制 materialized 優化]

(https://github.com/digoal/blog/blob/master/201903/20190309_04.md)

https://postgres.fun/20190807110800.html