【DB笔试面试638】在Oracle中,文本型字段直方图示例2个。

  • 2019 年 10 月 10 日
  • 筆記

题目部分

在Oracle中,文本型字段直方图示例2个。

答案部分

首先准备基础表:

CREATE TABLE T_ST_20170605_LHR(ID NUMBER,STR VARCHAR2(30));  INSERT INTO T_ST_20170605_LHR SELECT ROWNUM ID,1 STR FROM DUAL CONNECT BY LEVEL<=10001;  UPDATE T_ST_20170605_LHR T SET T.STR=6 WHERE T.ID=10001;  EXEC DBMS_STATS.GATHER_TABLE_STATS(USER,'T_ST_20170605_LHR',CASCADE=>TRUE,METHOD_OPT=>'FOR COLUMNS STR SIZE 2');  

查看直方图信息:

LHR@orclasm > COL COLUMN_NAME FORMAT A15  LHR@orclasm > SELECT D.COLUMN_NAME,D.NUM_DISTINCT,D.NUM_NULLS,D.NUM_BUCKETS,D.HISTOGRAM FROM DBA_TAB_COL_STATISTICS D WHERE D.TABLE_NAME = 'T_ST_20170605_LHR';    COLUMN_NAME     NUM_DISTINCT  NUM_NULLS NUM_BUCKETS HISTOGRAM  --------------- ------------ ---------- ----------- ---------------  STR                        2          0           2 FREQUENCY  LHR@orclasm > SELECT TABLE_NAME,COLUMN_NAME,ENDPOINT_VALUE,ENDPOINT_NUMBER FROM DBA_TAB_HISTOGRAMS WHERE TABLE_NAME='T_ST_20170605_LHR';    TABLE_NAME                     COLUMN_NAME     ENDPOINT_VALUE ENDPOINT_NUMBER  ------------------------------ --------------- -------------- ---------------  T_ST_20170605_LHR              STR                 2.5442E+35           10000  T_ST_20170605_LHR              STR                 2.8038E+35           10001  

这里的ENDPOINT_VALUE值需要去转换,字符‘1’的16进制的dump值为0x31,字符‘6’的16进制的dump值为0x36,

LHR@orclasm > SELECT DUMP('1',16),DUMP('6',16) FROM DUAL;    DUMP('1',16)     DUMP('6',16)  ---------------- ----------------  Typ=96 Len=1: 31 Typ=96 Len=1: 36  

将0x31右边补0一直补到15个字节(共30位),再将其转换为10进制数,0x36类似,如下所示:

LHR@orclasm > SELECT TO_NUMBER('310000000000000000000000000000','XXXXXXXXXXXXXXXXXXXXXXXXXXXXXX') C1,TO_NUMBER('360000000000000000000000000000','XXXXXXXXXXXXXXXXXXXXXXXXXXXXXX') C2 FROM DUAL;            C1         C2  ---------- ----------  2.5442E+35 2.8038E+35  

可以看到转换后的结果和之前查询出来的结果一致。为了方便转换给出如下函数:

CREATE OR REPLACE FUNCTION HEXSTR(P_NUMBER IN NUMBER) RETURN VARCHAR2 AS    L_STR    LONG := TO_CHAR(P_NUMBER, 'fm' || RPAD('x', 50, 'x'));    L_RETURN VARCHAR2(4000);  BEGIN    WHILE (L_STR IS NOT NULL) LOOP      L_RETURN := L_RETURN || CHR(TO_NUMBER(SUBSTR(L_STR, 1, 2), 'xx'));      L_STR    := SUBSTR(L_STR, 3);    END LOOP;      RETURN(SUBSTR(L_RETURN, 1, 6));  END;  

再次查询:

LHR@orclasm > COL ENDPOINT_VALUE2 FORMAT A15  LHR@orclasm > SELECT TABLE_NAME,COLUMN_NAME,ENDPOINT_VALUE,ENDPOINT_NUMBER,HEXSTR(ENDPOINT_VALUE) ENDPOINT_VALUE2 FROM DBA_TAB_HISTOGRAMS WHERE TABLE_NAME='T_ST_20170605_LHR';    TABLE_NAME                     COLUMN_NAME     ENDPOINT_VALUE ENDPOINT_NUMBER ENDPOINT_VALUE2  ------------------------------ --------------- -------------- --------------- ---------------  T_ST_20170605_LHR              STR                 2.5442E+35           10000 1  T_ST_20170605_LHR              STR                 2.8038E+35           10001 6  

示例2:

准备如下的表:

DROP TABLE T_HG_20170601_LHR;  CREATE TABLE T_HG_20170601_LHR AS SELECT LEVEL RN,'1' NAMES FROM DUAL  D  CONNECT BY LEVEL<=10001;  SELECT COUNT(1) FROM T_HG_20170601_LHR;  UPDATE T_HG_20170601_LHR T SET T.NAMES=2 WHERE T.RN=10001;  SELECT T.NAMES,COUNT(1) FROM T_HG_20170601_LHR T GROUP BY T.NAMES;  CREATE INDEX IDX_NAME ON T_HG_20170601_LHR(NAMES);  EXEC DBMS_STATS.GATHER_TABLE_STATS(USER,'T_HG_20170601_LHR',NO_INVALIDATE => FALSE,METHOD_OPT=>'FOR ALL COLUMNS SIZE 1');--不收集直方图  

数据分布情况如下所示:

LHR@orclasm > SELECT T.NAMES,COUNT(1) FROM T_HG_20170601_LHR T GROUP BY T.NAMES;  N   COUNT(1)  - ----------  1      10000  2          1  

NAMES为2的SQL执行计划:

LHR@orclasm > SELECT * FROM T_HG_20170601_LHR T WHERE T.NAMES='2';          RN N  ---------- -       10001 2  Execution Plan  ----------------------------------------------------------  Plan hash value: 2479558392  ---------------------------------------------------------------------------------------  | Id  | Operation         | Name              | Rows  | Bytes | Cost (%CPU)| Time     |  ---------------------------------------------------------------------------------------  |   0 | SELECT STATEMENT  |                   |  5001 | 30006 |     7   (0)| 00:00:01 |  |*  1 |  TABLE ACCESS FULL| T_HG_20170601_LHR |  5001 | 30006 |     7   (0)| 00:00:01 |  ---------------------------------------------------------------------------------------  

上述SQL应该走列NAMES上的索引IDX_NAME,但实际上CBO这里却选择了全表扫描。这是因为CBO默认认为列NAMES的数据是均匀分布的,而其实该列上的DISTINCT值只有1和2这两个值,所以CBO评估出来的对列B施加等值查询条件的可选择率就是1/2,进而评估出来的对列B施加等值查询条件的结果集的Cardinality就是5001:

LHR@orclasm > SELECT ROUND(10001*(1/2)) FROM DUAL;  ROUND(10001*(1/2))  ------------------                5001  

正是因为CBO评估出上述等值查询要返回结果集的Cardinality是5001,己经占了表T_HG_20170601_LHR总记录数的一半,所以CBO认为此时再走列B上的索引IDX_NAME就己经不合适了,进而就选择了全表扫描。但实际上,CBO对上述等值查询要返回结果集的Cardinality的评估己经与事实严重不符,评估出来的值是5001,其实却只有1,差了好几个数量级。

对表T_HG_20170601_LHR的列NAMES收集了直方图统计信息后,从如下结果可以看到,此时CBO正确地评估出了返回结果集的Cardinality不是5001而是1,进而就正确地选择了走索引IDX_NAME的执行计划:

LHR@orclasm > EXEC DBMS_STATS.GATHER_TABLE_STATS(USER,'T_HG_20170601_LHR',NO_INVALIDATE => FALSE,METHOD_OPT=>'FOR COLUMNS NAMES SIZE AUTO');  PL/SQL procedure successfully completed.  LHR@orclasm > SELECT * FROM T_HG_20170601_LHR T WHERE T.NAMES='2';          RN N  ---------- -       10001 2  Execution Plan  ----------------------------------------------------------  Plan hash value: 2033494884  -------------------------------------------------------------------------------------------------  | Id  | Operation                   | Name              | Rows  | Bytes | Cost (%CPU)| Time     |  -------------------------------------------------------------------------------------------------  |   0 | SELECT STATEMENT            |                   |     1 |     6 |     2   (0)| 00:00:01 |  |   1 |  TABLE ACCESS BY INDEX ROWID| T_HG_20170601_LHR |     1 |     6 |     2   (0)| 00:00:01 |  |*  2 |   INDEX RANGE SCAN          | IDX_NAME          |     1 |       |     1   (0)| 00:00:01 |  -------------------------------------------------------------------------------------------------  

& 说明:

有关直方图的更多内容可以参考我的BLOG:http://blog.itpub.net/26736162/viewspace-2139293/

本文选自《Oracle程序员面试笔试宝典》,作者:小麦苗