【DB笔试面试638】在Oracle中,文本型字段直方图示例2个。
- 2019 年 10 月 10 日
- 筆記
题目部分
在Oracle中,文本型字段直方图示例2个。
♣
答案部分
首先准备基础表:
CREATE TABLE T_ST_20170605_LHR(ID NUMBER,STR VARCHAR2(30)); INSERT INTO T_ST_20170605_LHR SELECT ROWNUM ID,1 STR FROM DUAL CONNECT BY LEVEL<=10001; UPDATE T_ST_20170605_LHR T SET T.STR=6 WHERE T.ID=10001; EXEC DBMS_STATS.GATHER_TABLE_STATS(USER,'T_ST_20170605_LHR',CASCADE=>TRUE,METHOD_OPT=>'FOR COLUMNS STR SIZE 2');
查看直方图信息:
LHR@orclasm > COL COLUMN_NAME FORMAT A15 LHR@orclasm > SELECT D.COLUMN_NAME,D.NUM_DISTINCT,D.NUM_NULLS,D.NUM_BUCKETS,D.HISTOGRAM FROM DBA_TAB_COL_STATISTICS D WHERE D.TABLE_NAME = 'T_ST_20170605_LHR'; COLUMN_NAME NUM_DISTINCT NUM_NULLS NUM_BUCKETS HISTOGRAM --------------- ------------ ---------- ----------- --------------- STR 2 0 2 FREQUENCY LHR@orclasm > SELECT TABLE_NAME,COLUMN_NAME,ENDPOINT_VALUE,ENDPOINT_NUMBER FROM DBA_TAB_HISTOGRAMS WHERE TABLE_NAME='T_ST_20170605_LHR'; TABLE_NAME COLUMN_NAME ENDPOINT_VALUE ENDPOINT_NUMBER ------------------------------ --------------- -------------- --------------- T_ST_20170605_LHR STR 2.5442E+35 10000 T_ST_20170605_LHR STR 2.8038E+35 10001
这里的ENDPOINT_VALUE值需要去转换,字符‘1’的16进制的dump值为0x31,字符‘6’的16进制的dump值为0x36,
LHR@orclasm > SELECT DUMP('1',16),DUMP('6',16) FROM DUAL; DUMP('1',16) DUMP('6',16) ---------------- ---------------- Typ=96 Len=1: 31 Typ=96 Len=1: 36
将0x31右边补0一直补到15个字节(共30位),再将其转换为10进制数,0x36类似,如下所示:
LHR@orclasm > SELECT TO_NUMBER('310000000000000000000000000000','XXXXXXXXXXXXXXXXXXXXXXXXXXXXXX') C1,TO_NUMBER('360000000000000000000000000000','XXXXXXXXXXXXXXXXXXXXXXXXXXXXXX') C2 FROM DUAL; C1 C2 ---------- ---------- 2.5442E+35 2.8038E+35
可以看到转换后的结果和之前查询出来的结果一致。为了方便转换给出如下函数:
CREATE OR REPLACE FUNCTION HEXSTR(P_NUMBER IN NUMBER) RETURN VARCHAR2 AS L_STR LONG := TO_CHAR(P_NUMBER, 'fm' || RPAD('x', 50, 'x')); L_RETURN VARCHAR2(4000); BEGIN WHILE (L_STR IS NOT NULL) LOOP L_RETURN := L_RETURN || CHR(TO_NUMBER(SUBSTR(L_STR, 1, 2), 'xx')); L_STR := SUBSTR(L_STR, 3); END LOOP; RETURN(SUBSTR(L_RETURN, 1, 6)); END;
再次查询:
LHR@orclasm > COL ENDPOINT_VALUE2 FORMAT A15 LHR@orclasm > SELECT TABLE_NAME,COLUMN_NAME,ENDPOINT_VALUE,ENDPOINT_NUMBER,HEXSTR(ENDPOINT_VALUE) ENDPOINT_VALUE2 FROM DBA_TAB_HISTOGRAMS WHERE TABLE_NAME='T_ST_20170605_LHR'; TABLE_NAME COLUMN_NAME ENDPOINT_VALUE ENDPOINT_NUMBER ENDPOINT_VALUE2 ------------------------------ --------------- -------------- --------------- --------------- T_ST_20170605_LHR STR 2.5442E+35 10000 1 T_ST_20170605_LHR STR 2.8038E+35 10001 6
示例2:
准备如下的表:
DROP TABLE T_HG_20170601_LHR; CREATE TABLE T_HG_20170601_LHR AS SELECT LEVEL RN,'1' NAMES FROM DUAL D CONNECT BY LEVEL<=10001; SELECT COUNT(1) FROM T_HG_20170601_LHR; UPDATE T_HG_20170601_LHR T SET T.NAMES=2 WHERE T.RN=10001; SELECT T.NAMES,COUNT(1) FROM T_HG_20170601_LHR T GROUP BY T.NAMES; CREATE INDEX IDX_NAME ON T_HG_20170601_LHR(NAMES); EXEC DBMS_STATS.GATHER_TABLE_STATS(USER,'T_HG_20170601_LHR',NO_INVALIDATE => FALSE,METHOD_OPT=>'FOR ALL COLUMNS SIZE 1');--不收集直方图
数据分布情况如下所示:
LHR@orclasm > SELECT T.NAMES,COUNT(1) FROM T_HG_20170601_LHR T GROUP BY T.NAMES; N COUNT(1) - ---------- 1 10000 2 1
NAMES为2的SQL执行计划:
LHR@orclasm > SELECT * FROM T_HG_20170601_LHR T WHERE T.NAMES='2'; RN N ---------- - 10001 2 Execution Plan ---------------------------------------------------------- Plan hash value: 2479558392 --------------------------------------------------------------------------------------- | Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time | --------------------------------------------------------------------------------------- | 0 | SELECT STATEMENT | | 5001 | 30006 | 7 (0)| 00:00:01 | |* 1 | TABLE ACCESS FULL| T_HG_20170601_LHR | 5001 | 30006 | 7 (0)| 00:00:01 | ---------------------------------------------------------------------------------------
上述SQL应该走列NAMES上的索引IDX_NAME,但实际上CBO这里却选择了全表扫描。这是因为CBO默认认为列NAMES的数据是均匀分布的,而其实该列上的DISTINCT值只有1和2这两个值,所以CBO评估出来的对列B施加等值查询条件的可选择率就是1/2,进而评估出来的对列B施加等值查询条件的结果集的Cardinality就是5001:
LHR@orclasm > SELECT ROUND(10001*(1/2)) FROM DUAL; ROUND(10001*(1/2)) ------------------ 5001
正是因为CBO评估出上述等值查询要返回结果集的Cardinality是5001,己经占了表T_HG_20170601_LHR总记录数的一半,所以CBO认为此时再走列B上的索引IDX_NAME就己经不合适了,进而就选择了全表扫描。但实际上,CBO对上述等值查询要返回结果集的Cardinality的评估己经与事实严重不符,评估出来的值是5001,其实却只有1,差了好几个数量级。
对表T_HG_20170601_LHR的列NAMES收集了直方图统计信息后,从如下结果可以看到,此时CBO正确地评估出了返回结果集的Cardinality不是5001而是1,进而就正确地选择了走索引IDX_NAME的执行计划:
LHR@orclasm > EXEC DBMS_STATS.GATHER_TABLE_STATS(USER,'T_HG_20170601_LHR',NO_INVALIDATE => FALSE,METHOD_OPT=>'FOR COLUMNS NAMES SIZE AUTO'); PL/SQL procedure successfully completed. LHR@orclasm > SELECT * FROM T_HG_20170601_LHR T WHERE T.NAMES='2'; RN N ---------- - 10001 2 Execution Plan ---------------------------------------------------------- Plan hash value: 2033494884 ------------------------------------------------------------------------------------------------- | Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time | ------------------------------------------------------------------------------------------------- | 0 | SELECT STATEMENT | | 1 | 6 | 2 (0)| 00:00:01 | | 1 | TABLE ACCESS BY INDEX ROWID| T_HG_20170601_LHR | 1 | 6 | 2 (0)| 00:00:01 | |* 2 | INDEX RANGE SCAN | IDX_NAME | 1 | | 1 (0)| 00:00:01 | -------------------------------------------------------------------------------------------------
& 说明:
有关直方图的更多内容可以参考我的BLOG:http://blog.itpub.net/26736162/viewspace-2139293/
本文选自《Oracle程序员面试笔试宝典》,作者:小麦苗