【DB筆試面試633】在Oracle中,什麼是待定的統計資訊(Pending Statistic)?
- 2019 年 10 月 10 日
- 筆記
題目部分
在Oracle中,什麼是待定的統計資訊(Pending Statistic)?
♣
答案部分
在資料庫系統運維中,DBA常常希望維持SQL執行計劃的穩定。很多DBA和開發人員對於Hint的依賴,很大程度上也是源於在CBO情況下,執行計劃對於統計量過於依賴,容易形成不穩定執行計劃。所以,SQL語句執行計劃的穩定性,就變成統計資訊的穩定性問題。更進一步,就是新的統計資訊更新,無論是手動收集還是自動收集,能否促進SQL語句生成更高效的執行計劃。所以,一種思路是:在新的統計資訊收集生成時,暫時不要生效投入執行計劃生成。等待最後確認統計資訊正確之後,再投入生產環境。
在Oracle 11g中,推出了統計資訊管理的一種新技術——待定的統計資訊(Pending Statistic)技術。簡單的說,DBA可以對一系列的數據表設置PENDING屬性。設置PENDING屬性之後,數據的統計資訊在數據字典中相當於已經鎖定。當新的統計資訊生成之後,不是直接替換原有的數據,而是存放在PENDING數據字典中。在PENDING字典中的統計資訊在默認情況下是不會參與SQL執行計劃的生成的。只有在進行SQL測試通過的時候,經過用戶手工的確定,才會將其PUBLISH出來,替換原有的統計資訊。這樣,就給運維DBA提供了一種維持執行計劃穩定的思路。通過固定統計資訊,將新統計資訊以PENDING的方式將原有的統計資訊固定,從而穩定執行計劃。
可以使用如下的SQL語句查詢統計資訊在全局、SCHEMA和表級別是否自動發布(默認情況下都是自動發布):
SELECT DBMS_STATS.GET_PREFS('PUBLISH') GLOBAL,DBMS_STATS.GET_PREFS('PUBLISH','LHR') SCHEMA,DBMS_STATS.GET_PREFS('PUBLISH','LHR','T') TB_LEVEL FROM DUAL;
以上SQL語句的查詢結果返回TRUE或FALSE。TRUE表明收集統計資訊完成後自動發布,而FALSE表明收集統計資訊完成後將待定。可以使用下面的包來改變各個級別的默認PUBLISH選項:
l 全局:EXEC DBMS_STATS.SET_GLOBAL_PREFS(PNAME=>'PUBLISH',PVALUE=>'FALSE');
l SCHEMA:EXEC DBMS_STATS.SET_SCHEMA_PREFS(OWNNAME=>USER,PNAME=>'PUBLISH',PVALUE=>'TRUE');
l 表:EXEC DBMS_STATS.SET_TABLE_PREFS(USER,'T_LHR','PUBLISH','FALSE');
預設情況下,優化器使用數據字典視圖中已發布的統計資訊。如果希望優化器使用新收集的待定統計資訊,那麼可以設置初始化參數OPTIMIZER_USE_PENDING_STATISTICS的值為TRUE(預設值為FALSE)。可以使用下面的SQL語句為一個特定的數據對象發布待定統計資訊:
EXEC DBMS_STATS.PUBLISH_PENDING_STATS('SH','CUSTOMERS');
如果不想發布待定的統計資訊,那麼可以執行下面的語句刪除這些待定的統計資訊:
EXEC DBMS_STATS.DELETE_PENDING_STATS('SH','CUSTOMERS');
可以通過視圖DBA_TAB_STATISTICS和DBA_IND_STATISTICS查詢發布的統計資訊,通過視圖DBA_TAB_PENDING_STATS和DBA_IND_PENDING_STATS查詢待定的統計資訊。可以使用存儲過程DBMS_STATS.EXPORT_PENDING_STATS導出待定的統計資訊。如果已經發布了統計資訊,想要恢復以前的統計資訊,那麼可以根據DBA_TAB_STATS_HISTORY中的STATS_UPDATE_TIME來確定TIMESTAMP,執行下面的操作,最後一個參數AS_OF_TIMESTAMP指的是恢復在這個時間點生效的統計資訊,所以可以多1秒:
LHR@orclasm > SELECT H.TABLE_NAME, TO_CHAR(H.STATS_UPDATE_TIME, 'YYYY-MM-DD HH24:MI:SS') STATS_UPDATE_TIME FROM USER_TAB_STATS_HISTORY H WHERE H.TABLE_NAME = 'T_PS_20170605_LHR'; TABLE_NAME STATS_UPDATE_TIME ------------------------------ ------------------- T_PS_20170605_LHR 2017-06-05 15:54:16 T_PS_20170605_LHR 2017-06-05 16:17:29 LHR@orclasm > EXEC DBMS_STATS.RESTORE_TABLE_STATS(OWNNAME => USER,TABNAME =>'T_PS_20170605_LHR',AS_OF_TIMESTAMP => TO_DATE('2017-06-05 15:54:17','YYYY-MM-DD HH24:MI:SS'));--恢復以前的統計資訊 PL/SQL procedure successfully completed.
下面給出一個使用Pending Statistic的完整示例:
CREATE TABLE T_PS_20170605_LHR AS SELECT LEVEL ID, 'name' || LEVEL NAME FROM DUAL CONNECT BY LEVEL<= 10000 ; CREATE INDEX IDX_T_PS_20170605_LHR_ID ON T_PS_20170605_LHR(ID) ; EXEC DBMS_STATS.GATHER_TABLE_STATS(USER,'T_PS_20170605_LHR') ;
查詢一下歷史統計資訊:
LHR@orclasm > SELECT H.TABLE_NAME, TO_CHAR(H.STATS_UPDATE_TIME, 'YYYY-MM-DD HH24:MI:SS') STATS_UPDATE_TIME FROM USER_TAB_STATS_HISTORY H WHERE H.TABLE_NAME = 'T_PS_20170605_LHR'; TABLE_NAME STATS_UPDATE_TIME ------------------------------ ------------------- T_PS_20170605_LHR 2017-06-05 15:54:16
進行簡單查詢:
LHR@orclasm > SET AUTOT ON LHR@orclasm > SELECT P.ID,P.NAME FROM T_PS_20170605_LHR P WHERE ID=1 ; ID NAME ---------- -------------------------------------------- 1 name1 Execution Plan ---------------------------------------------------------- Plan hash value: 2892875560 -------------------------------------------------------------------------------------------------------- | Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time | -------------------------------------------------------------------------------------------------------- | 0 | SELECT STATEMENT | | 1 | 13 | 2 (0)| 00:00:01 | | 1 | TABLE ACCESS BY INDEX ROWID| T_PS_20170605_LHR | 1 | 13 | 2 (0)| 00:00:01 | |* 2 | INDEX RANGE SCAN | IDX_T_PS_20170605_LHR_ID | 1 | | 1 (0)| 00:00:01 | -------------------------------------------------------------------------------------------------------- Predicate Information (identified by operation id): --------------------------------------------------- 2 - access("ID"=1) Statistics ---------------------------------------------------------- 1 recursive calls 0 db block gets 4 consistent gets 0 physical reads 0 redo size 596 bytes sent via SQL*Net to client 519 bytes received via SQL*Net from client 2 SQL*Net roundtrips to/from client 0 sorts (memory) 0 sorts (disk) 1 rows processed LHR@orclasm >
設定表的PUBLISH選項為FALSE:
LHR@orclasm > EXEC DBMS_STATS.SET_TABLE_PREFS(USER,'T_PS_20170605_LHR', 'PUBLISH', 'FALSE'); PL/SQL procedure successfully completed. LHR@orclasm > SELECT DBMS_STATS.GET_PREFS('PUBLISH',USER,'T_PS_20170605_LHR') FROM DUAL ; DBMS_STATS.GET_PREFS('PUBLISH',USER,'T_PS_20170605_LHR') --------------------------------------------------------------- FALSE
再次向表中插入2W行ID列都為1的數據:
INSERT INTO T_PS_20170605_LHR(ID,NAME) SELECT 1, 'name' || LEVEL FROM DUAL CONNECT BY LEVEL<= 20000 ; COMMIT ;
再次收集一下統計資訊,這個時候收集的統計資訊不會立刻被優化器使用:
LHR@orclasm > EXEC DBMS_STATS.GATHER_TABLE_STATS(USER,'T_PS_20170605_LHR') ; PL/SQL procedure successfully completed. LHR@orclasm > SET AUTOT TRACEONLY LHR@orclasm > SELECT P.ID,P.NAME FROM T_PS_20170605_LHR P WHERE ID=1 ; 20001 rows selected. Execution Plan ---------------------------------------------------------- Plan hash value: 2892875560 -------------------------------------------------------------------------------------------------------- | Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time | -------------------------------------------------------------------------------------------------------- | 0 | SELECT STATEMENT | | 1 | 13 | 2 (0)| 00:00:01 | | 1 | TABLE ACCESS BY INDEX ROWID| T_PS_20170605_LHR | 1 | 13 | 2 (0)| 00:00:01 | |* 2 | INDEX RANGE SCAN | IDX_T_PS_20170605_LHR_ID | 1 | | 1 (0)| 00:00:01 | -------------------------------------------------------------------------------------------------------- Predicate Information (identified by operation id): --------------------------------------------------- 2 - access("ID"=1) Statistics ---------------------------------------------------------- 0 recursive calls 0 db block gets 2778 consistent gets 0 physical reads 0 redo size 597478 bytes sent via SQL*Net to client 15182 bytes received via SQL*Net from client 1335 SQL*Net roundtrips to/from client 0 sorts (memory) 0 sorts (disk) 20001 rows processed
如所料,這裡還是使用舊的統計資訊,依舊使用INDEX RANGE SCAN代價比較高。看一下統計資訊的情況,已經發布的統計資訊還是比較老的,而如下所示PENDING表裡面的統計資訊表示新收集的待定的統計資訊:
LHR@orclasm > SELECT 'publish' AS STAT,T.NUM_ROWS,T.BLOCKS,TO_CHAR(T.LAST_ANALYZED,'YYYY-MM-DD HH24:MI:SS') FROM USER_TAB_STATISTICS T WHERE TABLE_NAME='T_PS_20170605_LHR' 2 UNION ALL 3 SELECT 'pending' AS STAT,S.NUM_ROWS,S.BLOCKS,TO_CHAR(S.LAST_ANALYZED,'YYYY-MM-DD HH24:MI:SS') FROM USER_TAB_PENDING_STATS S WHERE TABLE_NAME='T_PS_20170605_LHR'; STAT NUM_ROWS BLOCKS TO_CHAR(T.LAST_ANAL ------- ---------- ---------- ------------------- publish 10000 29 2017-06-05 15:54:16 pending 30000 84 2017-06-05 16:07:39
下面來驗證一下新的統計資訊是否有助於改善SQL語句的執行:
LHR@orclasm > ALTER SESSION SET OPTIMIZER_USE_PENDING_STATISTICS = TRUE; Session altered. LHR@orclasm > SET AUTOT TRACEONLY LHR@orclasm > SELECT P.ID,P.NAME FROM T_PS_20170605_LHR P WHERE ID=1 ; 20001 rows selected. Execution Plan ---------------------------------------------------------- Plan hash value: 4079616360 --------------------------------------------------------------------------------------- | Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time | --------------------------------------------------------------------------------------- | 0 | SELECT STATEMENT | | 19488 | 228K| 25 (0)| 00:00:01 | |* 1 | TABLE ACCESS FULL| T_PS_20170605_LHR | 19488 | 228K| 25 (0)| 00:00:01 | --------------------------------------------------------------------------------------- Predicate Information (identified by operation id): --------------------------------------------------- 1 - filter("ID"=1) Statistics ---------------------------------------------------------- 0 recursive calls 0 db block gets 1414 consistent gets 0 physical reads 0 redo size 533474 bytes sent via SQL*Net to client 15182 bytes received via SQL*Net from client 1335 SQL*Net roundtrips to/from client 0 sorts (memory) 0 sorts (disk) 20001 rows processed
可以看到,使用優化器使用待定的統計資訊生成的查詢計劃使用的是全表掃描,更加有效率驗證結束,無誤,可以發布新的統計資訊了:
LHR@orclasm > EXEC DBMS_STATS.PUBLISH_PENDING_STATS(USER,'T_PS_20170605_LHR'); PL/SQL procedure successfully completed. LHR@orclasm > ALTER SESSION SET OPTIMIZER_USE_PENDING_STATISTICS = FALSE; Session altered. LHR@orclasm > LHR@orclasm > set autot off LHR@orclasm > SELECT 'publish' AS STAT,T.NUM_ROWS,T.BLOCKS,TO_CHAR(T.LAST_ANALYZED,'YYYY-MM-DD HH24:MI:SS') FROM USER_TAB_STATISTICS T WHERE TABLE_NAME='T_PS_20170605_LHR' 2 UNION ALL 3 SELECT 'pending' AS STAT,S.NUM_ROWS,S.BLOCKS,TO_CHAR(S.LAST_ANALYZED,'YYYY-MM-DD HH24:MI:SS') FROM USER_TAB_PENDING_STATS S WHERE TABLE_NAME='T_PS_20170605_LHR'; STAT NUM_ROWS BLOCKS TO_CHAR(T.LAST_ANAL ------- ---------- ---------- ------------------- publish 30000 84 2017-06-05 16:07:39 LHR@orclasm > LHR@orclasm > SELECT H.TABLE_NAME, TO_CHAR(H.STATS_UPDATE_TIME, 'YYYY-MM-DD HH24:MI:SS') STATS_UPDATE_TIME FROM USER_TAB_STATS_HISTORY H WHERE H.TABLE_NAME = 'T_PS_20170605_LHR'; TABLE_NAME STATS_UPDATE_TIME ------------------------------ ------------------- T_PS_20170605_LHR 2017-06-05 15:54:16 T_PS_20170605_LHR 2017-06-05 16:17:29
可以看到PENDING的統計資訊已經發布並且從USER_TAB_PENDING_STATS中刪除,USER_TAB_STATISTICS表中的LAST_ANALYZED時間顯示的是統計資訊收集的時間。如果已經發布了統計資訊,想要恢復從前的統計資訊,可以根據USER_TAB_STATS_HISTORY中的STATS_UPDATE_TIME來確定TIMESTAMP,執行下面的操作,最後一個參數AS_OF_TIMESTAMP指的是恢復在這個時間點生效的統計資訊嗎,所以可以多1秒:
LHR@orclasm > EXEC DBMS_STATS.RESTORE_TABLE_STATS(OWNNAME => USER,TABNAME =>'T_PS_20170605_LHR',AS_OF_TIMESTAMP => TO_DATE('2017-06-05 15:54:17','YYYY-MM-DD HH24:MI:SS')); PL/SQL procedure successfully completed. LHR@orclasm > SELECT H.TABLE_NAME, TO_CHAR(H.STATS_UPDATE_TIME, 'YYYY-MM-DD HH24:MI:SS') STATS_UPDATE_TIME FROM USER_TAB_STATS_HISTORY H WHERE H.TABLE_NAME = 'T_PS_20170605_LHR'; TABLE_NAME STATS_UPDATE_TIME ------------------------------ ------------------- T_PS_20170605_LHR 2017-06-05 15:54:16 T_PS_20170605_LHR 2017-06-05 16:17:29 T_PS_20170605_LHR 2017-06-05 16:22:20 LHR@orclasm > SELECT 'publish' AS STAT,T.NUM_ROWS,T.BLOCKS,TO_CHAR(T.LAST_ANALYZED,'YYYY-MM-DD HH24:MI:SS') FROM USER_TAB_STATISTICS T WHERE TABLE_NAME='T_PS_20170605_LHR' 2 UNION ALL 3 SELECT 'pending' AS STAT,S.NUM_ROWS,S.BLOCKS,TO_CHAR(S.LAST_ANALYZED,'YYYY-MM-DD HH24:MI:SS') FROM USER_TAB_PENDING_STATS S WHERE TABLE_NAME='T_PS_20170605_LHR'; STAT NUM_ROWS BLOCKS TO_CHAR(T.LAST_ANAL ------- ---------- ---------- ------------------- publish 10000 29 2017-06-05 15:54:16
& 說明:
有關待定的統計資訊的更多內容可以參考我的BLOG:http://blog.itpub.net/26736162/viewspace-2140300/
本文選自《Oracle程式設計師面試筆試寶典》,作者:小麥苗