【DB笔试面试639】在Oracle中,什么是多列统计信息(Extended Statistics)?
- 2019 年 10 月 10 日
- 筆記
题目部分
在Oracle中,什么是多列统计信息(Extended Statistics)?
♣
答案部分
Oracle优化器对于基数值的估算是否准确关系到能否生成最优的执行计划,而基数值估算的准确性又取决于SQL中各个对象的统计信息是否完整、是否能真实反映出对象的数据分布情况。因此使用何种方法收集统计信息是很有讲究的:对于数据倾斜度较大的表需要收集直方图,在此基础上如果有多个列存在相关性,那么多列统计信息(也叫扩展统计信息)收集又是一个更好的选择。
在一般情况下,SQL语句的WHERE子句后面针对单张表都有多个条件,也就是根据多列的条件筛选得到数据。默认情况下,Oracle会把多列的选择率(Selectivity)相乘从而得到WHERE语句的选择率,但是这样有可能造成选择率不准确,从而导致优化器做出错误的判断。为了能够让优化器做出准确的判断,从而生成准确的执行计划,Oracle在11g数据库中引入了收集多列统计信息。多列统计信息包含列组统计信息(Column Group Statistics)和表达式的统计信息(Expression Statistics)。
使用程序包DBMS_STATS中的新函数CREATE_EXTENDED_STATS创建一个虚拟列,然后对表收集统计信息。如下所示,定义了两个扩展列:
SELECT DBMS_STATS.CREATE_EXTENDED_STATS(OWNNAME => 'TEST', TABNAME => 'T', EXTENSION => '(UPPER(PAD))'), DBMS_STATS.CREATE_EXTENDED_STATS(OWNNAME => 'TEST', TABNAME => 'T', EXTENSION => '(VAL2,VAL3)') FROM DUAL;
以上SQL是对TEST用户下的T表,分别基于表达式和基于多列创建虚拟列,下次再收集表的统计信息时,将会自动收集到多列统计信息。需要注意的是,不能对SYS用户下的表创建扩展的统计信息,否则会报错“ORA-20000: Unable to create extension: not supported for SYS owned table”。
使用Oracle自带的DBMS_STATS包提供的存储过程DROP_EXTENDED_STATS来删除扩展统计信息:
EXEC DBMS_STATS.DROP_EXTENDED_STATS(OWNNAME => 'TEST',TABNAME => 'T',EXTENSION => '(UPPER(PAD))'); EXEC DBMS_STATS.DROP_EXTENDED_STATS(OWNNAME => 'TEST',TABNAME => 'T',EXTENSION => '(VAL2,VAL3)');
定义扩展统计信息也可以直接在包DBMS_STATS中指定METHOD_OPT,收集统计信息时,把列组合作为单独列使用,如下所示:
BEGIN DBMS_STATS.GATHER_TABLE_STATS ( OWNNAME => 'SCOTT', TABNAME => 'BOOKS', ESTIMATE_PERCENT=> 100, METHOD_OPT => 'FOR ALL COLUMNS SIZE SKEWONLY FOR COLUMNS (HOTEL_ID,RATE_CATEGORY)', CASCADE => TRUE ); END;
在视图DBA_STAT_EXTENSIONS中,可以看到在数据库中定义的扩展统计信息:
SQL> SELECT EXTENSION_NAME, EXTENSION 2 FROM DBA_STAT_EXTENSIONS 3 WHERE TABLE_NAME='BOOKS'; EXTENSION_NAME EXTENSION ------------------------------ ------------------------------ SYS_STUW3MXAI1XLZHCHDYKJ9E4K90 ("HOTEL_ID","RATE_CATEGORY")
当不清楚需要创建哪些列的扩展统计信息时,可以针对一个表,基于特定的工作负荷,通过使用DBMS_STATS.SEED_COL_USAGE和REPORT_COL_USAGE来确定需要哪些列组。需要注意的是,这种技术不适用于包含表达式列的统计工作。主要过程如下所示:
EXEC DBMS_STATS.SEED_COL_USAGE(NULL,NULL,TIME_LIMIT=>100); EXPLAIN PLAN FOR SQL语句; SELECT DBMS_STATS.REPORT_COL_USAGE(OWNNAME=>'LHR',TABNAME=>'T_ES_20170601_LHR') FROM DUAL; SELECT DBMS_STATS.CREATE_EXTENDED_STATS(OWNNAME=>'LHR',TABNAME=>'T_ES_20170601_LHR') FROM DUAL;
多列统计信息的一个使用示例如下所示:
首先,创建测试表:
DROP TABLE T_ES_20170601_LHR; CREATE TABLE T_ES_20170601_LHR (C1 NUMBER,C2 VARCHAR2(2),C3 VARCHAR2(20)); DECLARE BEGIN FOR I IN 1 .. 5000 LOOP INSERT INTO T_ES_20170601_LHR VALUES (1, 'AA', DBMS_RANDOM.STRING('l', 20)); INSERT INTO T_ES_20170601_LHR VALUES (2, 'BB', DBMS_RANDOM.STRING('l', 20)); INSERT INTO T_ES_20170601_LHR VALUES (3, 'CC', DBMS_RANDOM.STRING('l', 20)); INSERT INTO T_ES_20170601_LHR VALUES (4, 'DD', DBMS_RANDOM.STRING('l', 20)); END LOOP; COMMIT; END; / INSERT INTO T_ES_20170601_LHR VALUES(11,'A','AAAAAAA'); INSERT INTO T_ES_20170601_LHR VALUES(22,'B','BBBBBBB'); INSERT INTO T_ES_20170601_LHR VALUES(33,'C','CCCCCCC'); INSERT INTO T_ES_20170601_LHR VALUES(44,'D','DDDDDDD'); COMMIT;
数据分布如下所示:
LHR@orclasm > SELECT COUNT(1) FROM T_ES_20170601_LHR; COUNT(1) ---------- 20004 LHR@orclasm > SELECT C1,C2,COUNT(1) FROM T_ES_20170601_LHR GROUP BY C1,C2 ORDER BY C1; C1 C2 COUNT(1) ---------- -- ---------- 1 AA 5000 2 BB 5000 3 CC 5000 4 DD 5000 11 A 1 22 B 1 33 C 1 44 D 1 8 rows selected.
接下来收集T_ES_20170601_LHR表的统计信息,但不收集直方图的信息(收集前确认默认的ESTIMATE_PERCENT为AUTO_SAMPLE_SIZE):
LHR@orclasm > SELECT DBMS_STATS.GET_PREFS('ESTIMATE_PERCENT',NULL,NULL) FROM DUAL; DBMS_STATS.GET_PREFS('ESTIMATE_PERCENT',NULL,NULL) ----------------------------------- DBMS_STATS.AUTO_SAMPLE_SIZE LHR@orclasm > EXEC DBMS_STATS.GATHER_TABLE_STATS(OWNNAME=>'LHR',TABNAME=>'T_ES_20170601_LHR',METHOD_OPT=>'FOR ALL COLUMNS SIZE 1'); PL/SQL procedure successfully completed. LHR@orclasm > SET LINESIZE 200 LHR@orclasm > SELECT OWNER,TABLE_NAME,NUM_DISTINCT,SAMPLE_SIZE,COLUMN_NAME,HISTOGRAM FROM DBA_TAB_COL_STATISTICS WHERE OWNER='LHR' AND TABLE_NAME='T_ES_20170601_LHR'; OWNER TABLE_NAME NUM_DISTINCT SAMPLE_SIZE COLUMN_NAME HISTOGRAM ------------------------------ ------------------------------ ------------ ----------- ------------------------------ --------------- LHR T_ES_20170601_LHR 8 20004 C1 NONE LHR T_ES_20170601_LHR 8 20004 C2 NONE LHR T_ES_20170601_LHR 20004 20004 C3 NONE
下面分别执行如下2条SQL语句,然后查看预估行数:
SELECT * FROM T_ES_20170601_LHR WHERE C1=1 AND C2='AA';
SELECT * FROM T_ES_20170601_LHR WHERE C1=11 AND C2='A';
LHR@orclasm > SELECT COUNT(*) FROM T_ES_20170601_LHR WHERE C1=1 AND C2='AA'; COUNT(*) ---------- 5000 LHR@orclasm > EXPLAIN PLAN FOR SELECT COUNT(*) FROM T_ES_20170601_LHR WHERE C1=1 AND C2='AA'; Explained. LHR@orclasm > SELECT * FROM TABLE(DBMS_XPLAN.DISPLAY()); PLAN_TABLE_OUTPUT -------------------------------------------------------------------------------- Plan hash value: 3668985715 ---------------------------------------------------------------------------------------- | Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time | ---------------------------------------------------------------------------------------- | 0 | SELECT STATEMENT | | 1 | 6 | 27 (0)| 00:00:01 | | 1 | SORT AGGREGATE | | 1 | 6 | | | |* 2 | TABLE ACCESS FULL| T_ES_20170601_LHR | 313 | 1878 | 27 (0)| 00:00:01 | ---------------------------------------------------------------------------------------- Predicate Information (identified by operation id): --------------------------------------------------- 2 - filter("C1"=1 AND "C2"='AA') LHR@orclasm > SELECT COUNT(*) FROM T_ES_20170601_LHR WHERE C1=11 AND C2='A'; COUNT(*) ---------- 1 LHR@orclasm > EXPLAIN PLAN FOR SELECT COUNT(*) FROM T_ES_20170601_LHR WHERE C1=11 AND C2='A'; Explained. LHR@orclasm > SELECT * FROM TABLE(DBMS_XPLAN.DISPLAY()); PLAN_TABLE_OUTPUT -------------------------------------------------------------------------------- Plan hash value: 3668985715 ---------------------------------------------------------------------------------------- | Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time | ---------------------------------------------------------------------------------------- | 0 | SELECT STATEMENT | | 1 | 6 | 27 (0)| 00:00:01 | | 1 | SORT AGGREGATE | | 1 | 6 | | | |* 2 | TABLE ACCESS FULL| T_ES_20170601_LHR | 313 | 1878 | 27 (0)| 00:00:01 | ---------------------------------------------------------------------------------------- Predicate Information (identified by operation id): --------------------------------------------------- 2 - filter("C1"=11 AND "C2"='A')
可以看到有如下的结果:
SELECT * FROM T_ES_20170601_LHR WHERE C1=1 AND C2='AA';–实际返回5000条,预估313条
SELECT * FROM T_ES_20170601_LHR WHERE C1=11 AND C2='A';–实际返回1条,预估313条
在上面的两个查询中Cardinality的计算方法为:ROUND(NUM_ROWS*(1/NUM_DISTINCT_C1)*(1/NUM_DISTINCT_C2))=ROUND(20004*(1/8)*(1/8))=313,和执行计划里的313相吻合,因为没有收集列的直方图信息,所以优化器估算返回行数和实际返回行数还是有不少差距。
下面对C1、C2列收集直方图后重新执行上面两个查询:
LHR@orclasm > EXEC DBMS_STATS.GATHER_TABLE_STATS(OWNNAME=>'LHR',TABNAME=>'T_ES_20170601_LHR',METHOD_OPT=>'FOR COLUMNS C1 SIZE SKEWONLY,C2 SIZE SKEWONLY'); PL/SQL procedure successfully completed. LHR@orclasm > SELECT OWNER,TABLE_NAME,NUM_DISTINCT,DENSITY,NUM_BUCKETS,SAMPLE_SIZE,COLUMN_NAME,HISTOGRAM FROM DBA_TAB_COL_STATISTICS WHERE OWNER='LHR' AND TABLE_NAME='T_ES_20170601_LHR'; OWNER TABLE_NAME NUM_DISTINCT DENSITY NUM_BUCKETS SAMPLE_SIZE COLUMN_NAME HISTOGRAM ------- ------------------ ------------ ---------- ----------- ----------- ------------ --------------- LHR T_ES_20170601_LHR 8 .000024995 8 20004 C1 FREQUENCY LHR T_ES_20170601_LHR 8 .000024995 8 20004 C2 FREQUENCY LHR T_ES_20170601_LHR 20004 .00004999 1 20004 C3 NONE
对于C1、C2列DENSITY值的计算:1/(NUM_ROWS*2)=1/(20004*2)=0.000024995
对于c2列因为没有直方图,density值是这样计算出来的:1/num_distinct_c3=0.000050155
LHR@orclasm > COL COLUMN_NAME FORMAT A30 LHR@orclasm > COL ENDPOINT_ACTUAL_VALUE FORMAT A50 LHR@orclasm > SET LINESIZE 170 LHR@orclasm > SET PAGESIZE 100 LHR@orclasm > SELECT OWNER,TABLE_NAME,COLUMN_NAME,ENDPOINT_NUMBER,ENDPOINT_VALUE FROM DBA_TAB_HISTOGRAMS WHERE TABLE_NAME='T_ES_20170601_LHR'; OWNER TABLE_NAME COLUMN_NAME ENDPOINT_NUMBER ENDPOINT_VALUE ------------------------------ ------------------------------ ------------------------------ --------------- -------------- LHR T_ES_20170601_LHR C1 5000 1 LHR T_ES_20170601_LHR C1 10000 2 LHR T_ES_20170601_LHR C1 15000 3 LHR T_ES_20170601_LHR C1 20000 4 LHR T_ES_20170601_LHR C1 20001 11 LHR T_ES_20170601_LHR C1 20002 22 LHR T_ES_20170601_LHR C1 20003 33 LHR T_ES_20170601_LHR C1 20004 44 LHR T_ES_20170601_LHR C2 1 3.3750E+35 LHR T_ES_20170601_LHR C2 5001 3.3882E+35 LHR T_ES_20170601_LHR C2 5002 3.4269E+35 LHR T_ES_20170601_LHR C2 10002 3.4403E+35 LHR T_ES_20170601_LHR C2 10003 3.4788E+35 LHR T_ES_20170601_LHR C2 15003 3.4924E+35 LHR T_ES_20170601_LHR C2 15004 3.5308E+35 LHR T_ES_20170601_LHR C2 20004 3.5446E+35 LHR T_ES_20170601_LHR C3 0 3.3882E+35 LHR T_ES_20170601_LHR C3 1 6.3594E+35 18 rows selected.
“C1=1 AND C2='AA'”作为PREDICATE执行查询,看下这次是否CARDINALITY值会更加接近真实返回值:
LHR@orclasm > explain plan for select count(*) from T_ES_20170601_LHR where c1=1 and c2='AA'; Explained. LHR@orclasm > select * from table(dbms_xplan.display()); PLAN_TABLE_OUTPUT ------------------------------------------------------------- Plan hash value: 3668985715 ---------------------------------------------------------------------------------------- | Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time | ---------------------------------------------------------------------------------------- | 0 | SELECT STATEMENT | | 1 | 6 | 27 (0)| 00:00:01 | | 1 | SORT AGGREGATE | | 1 | 6 | | | |* 2 | TABLE ACCESS FULL| T_ES_20170601_LHR | 1250 | 7500 | 27 (0)| 00:00:01 | ---------------------------------------------------------------------------------------- Predicate Information (identified by operation id): --------------------------------------------------- 2 - filter("C1"=1 AND "C2"='AA')
执行计划里的Rows预估方法为:ROUND(NUM_ROWS*(5000/20004)*(5000/20004))=ROUND(20004*0.0624)=1250,相比未收集直方图之前的313更接近于真实值5000,可见有了直方图之后的估算更加准确了。
C1=11 AND C2='A'作为PREDICATE执行查询,看下这次是否CARDINALITY值会更加接近真实返回值:
LHR@orclasm > explain plan for select count(*) from T_ES_20170601_LHR where c1=11 and c2='A'; Explained. LHR@orclasm > select * from table(dbms_xplan.display()); PLAN_TABLE_OUTPUT ------------------------------------------------------------------- Plan hash value: 3668985715 ---------------------------------------------------------------------------------------- | Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time | ---------------------------------------------------------------------------------------- | 0 | SELECT STATEMENT | | 1 | 6 | 27 (0)| 00:00:01 | | 1 | SORT AGGREGATE | | 1 | 6 | | | |* 2 | TABLE ACCESS FULL| T_ES_20170601_LHR | 1 | 6 | 27 (0)| 00:00:01 | ---------------------------------------------------------------------------------------- Predicate Information (identified by operation id): --------------------------------------------------- 2 - filter("C1"=11 AND "C2"='A')
执行计划里的Rows预估方法为:NUM_ROWS*(1/20004)*(1/20004)=0.00005,近似取值为1。
可见在收集了直方图后的Cardinality值比没有直方图的情况虽然更接近真实值,但还是有不少差距,下面收集多列统计信息。多列统计信息可以根据列与列之间的相关性将相关程度高的几列划入Column Group,之后的统计信息就是基于这个Column Group进行收集。本例T_ES_20170601_LHR表里的C1、C2两个字段就具有一定的相关性,例如C1=1的字段只和C2='AA'的字段组合成一行,C1=1的字段不会和除了C2='AA'以外的值组合成一行,这就是C1、C2之间存在明显的相关性,所以C1和C2可以构成一个COLUMN GROUP来形成更精确的统计信息,对Column Group收集统计信息的方法有两种:
1、采纳系统检测工作负载后给出的建议值后收集统计,如果DBA对表里数据构成情况及表中哪些列具有相关性事先不知道的情况下可以采用这种方法,Oracle会根据当前的负载给出哪些表里的哪几个列之间存在相关性的建议,DBA如果采纳这个建议就可以在这几个列上创建出Column Group。
2、手动创建Column Group后再收集统计信息,对表中具有相关性的列心知肚明,就可以使用手动创建的方法。
下面简要介绍一下这两种方法:
方法1:采纳系统检测工作负载后给出的建议值来生成column group
这个方法里又有两种选择,既可以让Oracle针对特定的SQL语句来评估是否有创建Column Groups的必要,也可以从sql cursor cache、auto workload repository等已经生成的负载里兜取已经执行过的SQL语句来评估是否可以创建column groups。可以针对一个表,基于特定的工作负荷,通过使用DBMS_STATS.SEED_COL_USAGE和REPORT_COL_USAGE来确定需要哪些列组。当不清楚需要创建哪些列的扩展统计信息时,这个技术是非常有用的。需要注意的是,这种技术不适用于包含表达式列的统计工作。
针对“SELECT * FROM T_ES_20170601_LHR WHERE C1=1 AND C2='AA'”让Oracle生成创建Column Group的建议。
LHR@orclasm > EXEC DBMS_STATS.SEED_COL_USAGE(NULL,NULL,TIME_LIMIT=>100); PL/SQL procedure successfully completed. LHR@orclasm > EXPLAIN PLAN FOR SELECT * FROM T_ES_20170601_LHR WHERE C1=1 AND C2='AA'; Explained. LHR@orclasm > SET LONG 20000 LHR@orclasm > SET PAGESIZE 100 LHR@orclasm > SELECT DBMS_STATS.REPORT_COL_USAGE(OWNNAME=>'LHR',TABNAME=>'T_ES_20170601_LHR') FROM DUAL; DBMS_STATS.REPORT_COL_USAGE(OWNNAME=>'LHR',TABNAME=>'T_ES_20170601_LHR') -------------------------------------------------------------------------------- LEGEND: ....... EQ : Used in single table EQuality predicate RANGE : Used in single table RANGE predicate LIKE : Used in single table LIKE predicate NULL : Used in single table is (not) NULL predicate EQ_JOIN : Used in EQuality JOIN predicate NONEQ_JOIN : Used in NON EQuality JOIN predicate FILTER : Used in single table FILTER predicate JOIN : Used in JOIN predicate GROUP_BY : Used in GROUP BY expression ............................................................................... ############################################################################### COLUMN USAGE REPORT FOR LHR.T_ES_20170601_LHR ............................................. 1. C1 : EQ 2. C2 : EQ 3. (C1, C2) : FILTER ###############################################################################
根据上面(C1, C2):filter的建议,生成Column Group:
LHR@orclasm > SELECT DBMS_STATS.CREATE_EXTENDED_STATS(OWNNAME=>'LHR',TABNAME=>'T_ES_20170601_LHR') FROM DUAL; DBMS_STATS.CREATE_EXTENDED_STATS(OWNNAME=>'LHR',TABNAME=>'T_ES_20170601_LHR') -------------------------------------------------------------------------------- ############################################################################### EXTENSIONS FOR LHR.T_ES_20170601_LHR .................................... 1. (C1, C2) : SYS_STUF3GLKIOP5F4B0BTTCFTMX0W created ###############################################################################
DBA_STAT_EXTENSIONS查询Column Group信息:
LHR@orclasm > COL EXTENSION FORMAT A50 LHR@orclasm > SET LINESIZE 170 LHR@orclasm > SELECT * FROM DBA_STAT_EXTENSIONS WHERE TABLE_NAME='T_ES_20170601_LHR'; OWNER TABLE_NAME EXTENSION_NAME EXTENSION CREATOR DROPPABLE ------ ------------------- ------------------------------ --------------- ------- ----------- LHR T_ES_20170601_LHR SYS_STUF3GLKIOP5F4B0BTTCFTMX0W ("C1","C2") USER YES
“SYS_STUF3GLKIOP5F4B0BTTCFTMX0W”是系统为Column Group自动生成的名称,可以把它看作表中的一个列,针对“SYS_STUF3GLKIOP5F4B0BTTCFTMX0W”列生成统计信息:
LHR@orclasm > SET LINESIZE 170 LHR@orclasm > COL EXTENSION FORMAT A15 LHR@orclasm > SELECT T1.OWNER,T1.TABLE_NAME,T1.COLUMN_NAME,T2.EXTENSION,NUM_DISTINCT,SAMPLE_SIZE,HISTOGRAM FROM DBA_TAB_COL_STATISTICS T1,DBA_STAT_EXTENSIONS T2 WHERE T1.OWNER='LHR' AND T1.TABLE_NAME='T_ES_20170601_LHR' AND T1.OWNER=T2.OWNER AND T1.TABLE_NAME=T2.TABLE_NAME AND T1.COLUMN_NAME=T2.EXTENSION_NAME; no rows selected LHR@orclasm > EXEC DBMS_STATS.GATHER_TABLE_STATS(OWNNAME=>'LHR',TABNAME=>'T_ES_20170601_LHR',METHOD_OPT=>'FOR COLUMNS SYS_STUF3GLKIOP5F4B0BTTCFTMX0W SIZE SKEWONLY'); PL/SQL procedure successfully completed. LHR@orclasm > SELECT T1.OWNER,T1.TABLE_NAME,T1.COLUMN_NAME,T2.EXTENSION,NUM_DISTINCT,SAMPLE_SIZE,HISTOGRAM FROM DBA_TAB_COL_STATISTICS T1,DBA_STAT_EXTENSIONS T2 WHERE T1.OWNER='LHR' AND T1.TABLE_NAME='T_ES_20170601_LHR' AND T1.OWNER=T2.OWNER AND T1.TABLE_NAME=T2.TABLE_NAME AND T1.COLUMN_NAME=T2.EXTENSION_NAME; OWNER TABLE_NAME COLUMN_NAME EXTENSION NUM_DISTINCT SAMPLE_SIZE HISTOGRAM ------- ------------------- ------------------------------ --------------- ------------ ----------- --------------- LHR T_ES_20170601_LHR SYS_STUF3GLKIOP5F4B0BTTCFTMX0W ("C1","C2") 8 20004 FREQUENCY
可以看到已经为SYS_STUF3GLKIOP5F4B0BTTCFTMX0W生成了统计信息,这个统计就是多列统计(Multicolumns Statistics)或者列组统计(Column Group Statistics)
方法2:手动创建Column Group,手动创建Column Group后再通过DBMS_STATS.GATHER_TABLE_STATS收集统计
SELECT DBMS_STATS.CREATE_EXTENDED_STATS(ownname=>'LHR',tabname=>'T_ES_20170601_LHR',extension=>'(c1,c2)') FROM DUAL; DBMS_STATS.CREATE_EXTENDED_STATS(OWNNAME=>'LHR',TABNAME=>'T_ES_20170601_LHR',EXTENSION=>'(C1,C2)') ------------------------------------------------------------------- SYS_STU3RTXGYOX7NS$MIUDXQDMQ0C EXEC DBMS_STATS.GATHER_TABLE_STATS(OWNNAME=>'LHR',TABNAME=>'T_ES_20170601_LHR',METHOD_OPT=>'FOR COLUMNS SYS_STU3RTXGYOX7NS$MIUDXQDMQ0C SIZE SKEWONLY');)
或者一步到位,直接对C1、C2列执行统计信息收集,同时也会生成Column Group
EXEC DBMS_STATS.gather_table_stats('LHR','T_ES_20170601_LHR',method_opt=>'for columns (c1,c2) size skewonly');
先来看看对于代表组合列(c1,c2)的SYS_STUF3GLKIOP5F4B0BTTCFTMX0W列在DBA_TAB_HISTOGRAM里的数据分布情况
LHR@orclasm > COL COLUMN_NAME FORMAT A30 LHR@orclasm > COL ENDPOINT_ACTUAL_VALUE FORMAT A50 LHR@orclasm > SET LINESIZE 170 LHR@orclasm > SET PAGESIZE 100 LHR@orclasm > SELECT OWNER,TABLE_NAME,COLUMN_NAME,ENDPOINT_NUMBER,ENDPOINT_VALUE FROM DBA_TAB_HISTOGRAMS WHERE TABLE_NAME='T_ES_20170601_LHR' AND COLUMN_NAME='SYS_STUF3GLKIOP5F4B0BTTCFTMX0W'; OWNER TABLE_NAME COLUMN_NAME ENDPOINT_NUMBER ENDPOINT_VALUE ------------------------------ ------------------------------ ------------------------------ --------------- -------------- LHR T_ES_20170601_LHR SYS_STUF3GLKIOP5F4B0BTTCFTMX0W 1 716089956 LHR T_ES_20170601_LHR SYS_STUF3GLKIOP5F4B0BTTCFTMX0W 5001 2693090364 LHR T_ES_20170601_LHR SYS_STUF3GLKIOP5F4B0BTTCFTMX0W 5002 3718690277 LHR T_ES_20170601_LHR SYS_STUF3GLKIOP5F4B0BTTCFTMX0W 10002 3926166024 LHR T_ES_20170601_LHR SYS_STUF3GLKIOP5F4B0BTTCFTMX0W 10003 5232674306 LHR T_ES_20170601_LHR SYS_STUF3GLKIOP5F4B0BTTCFTMX0W 15003 5561960012 LHR T_ES_20170601_LHR SYS_STUF3GLKIOP5F4B0BTTCFTMX0W 20003 5832235708 LHR T_ES_20170601_LHR SYS_STUF3GLKIOP5F4B0BTTCFTMX0W 20004 6322890850 8 rows selected.
预测一下有了基于(c1、c2)的Column Groups后,SQL语句“SELECT * FROM T_ES_20170601_LHR WHERE C1=1 AND C2='AA';”的Cardinality返回值会变成:
Cardinality=NUM_ROWS*5000/20004=20004*5000/20004=5000
生成了Column Group Statistics之后再次执行一开始的那句SQL:“SELECT * FROM T_ES_20170601_LHR WHERE C1=1 AND C2='AA';”,看看是否能帮助优化器算出更精确的Cardinality:
LHR@orclasm > EXPLAIN PLAN FOR SELECT COUNT(*) FROM T_ES_20170601_LHR WHERE C1=1 AND C2='AA'; Explained. LHR@orclasm > SET LINESIZE 150 LHR@orclasm > SELECT * FROM TABLE(DBMS_XPLAN.DISPLAY()); PLAN_TABLE_OUTPUT ----------------------------------------------------------------------------------------------- Plan hash value: 3668985715 ---------------------------------------------------------------------------------------- | Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time | ---------------------------------------------------------------------------------------- | 0 | SELECT STATEMENT | | 1 | 6 | 27 (0)| 00:00:01 | | 1 | SORT AGGREGATE | | 1 | 6 | | | |* 2 | TABLE ACCESS FULL| T_ES_20170601_LHR | 5000 | 30000 | 27 (0)| 00:00:01 | ---------------------------------------------------------------------------------------- Predicate Information (identified by operation id): --------------------------------------------------- 2 - filter("C1"=1 AND "C2"='AA')
总结:如果表中的数据倾斜度较大,那么收集直方图能最大程度的帮助优化器计算出准确的Cardinality,从而避免产生差的执行计划;再进一步,如果存在倾斜的多个列共同构成了Predicate里的等值连接且这些列间存在较强的列相关性的话,那么生成带有直方图的多列统计信息是一个上佳的选择,能够最大程度的帮助优化器准确预测出Cardinality。
& 说明:
有关多列统计信息的更多内容可以参考我的BLOG:http://blog.itpub.net/26736162/viewspace-2139297/
本文选自《Oracle程序员面试笔试宝典》,作者:小麦苗