分析函數
- 2021 年 2 月 8 日
- 筆記
概念
官方定義:
Analytic functions compute an aggregate value based on a group of rows. They differ from aggregate functions in that they return multiple rows for each group. The group of rows is called a window and is defined by the analytic_clause. For each row, a sliding window of rows is defined. The window determines the range of rows used to perform the calculations for the current row. Window sizes can be based on either a physical number of rows or a logical interval such as time.
Analytic functions are the last set of operations performed in a query except for the final ORDER BY clause. All joins and all WHERE, GROUP BY, and HAVINGclauses are completed before the analytic functions are processed. Therefore, analytic functions can appear only in the select list or ORDER BY clause.
Analytic functions are commonly used to compute cumulative, moving, centered, and reporting aggregates.
有以下幾個關鍵點:
-
對一組數據進行計算,返回多行
-
不需要進行多表聯合,提高性能
-
在所有表連接和所有WHERE, GROUP BY和HAVING字句之後處理,在ORDER BY子句之前處理
-
只能位於SELECT或者ORDER BY子句
語法
-
常用analytic_function
- AVG,MAX,MIN,SUM,COUNT
- DENSE_RANK,RANK,ROW_NUMBER, CUME_DIST
- LAG,LEAD
- FIRST,LAST
- NTILE
- FIRST_VALUE/LAST_VALUE
- LISTAGG
- RATIO_TO_REPORT
-
arguments個數:0~3
-
arguments類型:數字類型或可以隱式轉為為數字類型的非數字類型
-
analytic_clause
-
在FROM,WHERE,GROUP BY和HAVING子句之後進行計算
-
在SELECT和ORDER BY子句指定帶analytic_clause的分析函數
-
query_partition_clause
-
根據expr對查詢結果進行分組
-
忽略該語句則查詢結果為一個分組
-
分析函數使用上面的分支,不帶括弧
-
Expr可以是常量,欄位,非分析函數,函數表達式
-
-
order_by_clause
- 指定分區中數據的排序方式
- 當排序結果有相同值時:
- DENSE_RANK, RANK返回相同值
- ROW_NUMBER 返回不同值,根據處理行的順序排序
- 限制
- 在分析函數中只能使用expr,position和c_alias無效
- 在分析函數中使用RANGE關鍵字且使用以下窗口就可以使用多個排序鍵
- RANGE BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW(RANGE UNBOUNDED PRECEDING)
- RANGE BETWEEN CURRENT ROW AND UNBOUNDED FOLLOWING
- RANGE BETWEEN CURRENT ROW AND CURRENT ROW
- RANGE BETWEEN UNBOUNDED PRECEDING AND UNBOUNDED FOLLOWING
-
windowing_clause
- 支援windowing_clause的分析函數:AVG,MAX,MIN,SUM,COUNT
- ROWS | RANGE
- 為每行定義一個窗口用於計算函數結果
- ROWS:以行指定窗口
- RANGE:以邏輯偏移量指定窗口
- BETWEEN … AND
- 指定窗口的起始點和結束點
- 省略BETWEEN,則指定的點為起始點,結束點默認為當前行(current row)
- 只有指定了order_by_clause才能使用windowing_clause
- 如果省略了windowing_clause,則默認為RANGE BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW
- UNBOUNDED PRECEDING:從分區的第一行開始,起始點
- UNBOUNDED FOLLOWING:到分區的最後一行結束,結束點
- CURRENT ROW
- 作為起始點時,CURRENT ROW指定窗口開始於當前行或者某個值(取決於使用ROW還是RANGE),這時結束點不能是value_expr PRECEDING。
- 作為結束點時,CURRENT ROW指定窗口結束於當前行或者某個值(取決於使用ROW還是RANGE),這時開始點不能是value_expr FOLLOWING。
- value_expr PRECEDING or value_expr FOLLOWING
- 對於RANGE或者ROW
- 如果起始點是value_expr FOLLOWING,則結束點必須是value_expr FOLLOWING
- 如果結束點是value_expr PRECEDING,則起始點必須是value_expr PRECEDING
- 如果指定了ROWS
- value_expr是一個物理偏移量。必須是常量或表達式, 並且必須計算為正數數值
- 如果value_expr是起始點的一部分,則必須位於結束點之前的行
- 如果指定了RANGE
- value_expr是一個邏輯偏移量。必須是一個常量或表達式, 計算結果為正值數值或間隔文本
- 在order_by_clause只能使用一個排序鍵
- 如果value_expr為數值,則ORDER BY expr必須為數字或日期類型
- 如果value_expr為間隔值,則ORDER BY expr必須為日期類型
- 對於RANGE或者ROW
-
分類
Type | Used For |
---|---|
Reporting | Calculating shares, for example, market share. Works with these functions: SUM, AVG, MIN, MAX, COUNT (with/without DISTINCT), VARIANCE, STDDEV, RATIO_TO_REPORT, and new statistical functions. Note that the DISTINCT keyword may be used in those reporting functions that support DISTINCT in aggregate mode. |
Windowing | Calculating cumulative and moving aggregates. Works with these functions: SUM, AVG, MIN, MAX, COUNT, VARIANCE, STDDEV, FIRST_VALUE, LAST_VALUE, and new statistical functions. Note that the DISTINCTkeyword is not supported in windowing functions except for MAX and MIN. |
Ranking | Calculating ranks, percentiles, and n-tiles of the values in a result set. |
LAG/LEAD | Finding a value in a row a specified number of rows from a current row. |
FIRST/LAST | First or last value in an ordered group. |
Hypothetical Rank and Distribution | The rank or percentile that a row would have if inserted into a specified data set. |
Reporting
- 查詢人員資訊以及公司平均薪水,最小薪水,最大薪水,薪水總計以及人數
select employee_id,last_name,department_id,salary,
avg(salary) over () avg_sal,
max(salary) over () max_sal,
min(salary) over () min_sal,
sum(salary) over () sum_sal,
count(salary) over () count_sal
from employees order by department_id;
EMPLOYEE_ID LAST_NAME DEPARTMENT_ID SALARY AVG_SAL MAX_SAL MIN_SAL SUM_SAL COUNT_SAL
----------- --------------- ------------- ---------- ---------- ---------- ---------- ---------- ----------
200 Whalen 10 4400 6461.83178 24000 2100 691416 107
201 Hartstein 20 13000 6461.83178 24000 2100 691416 107
202 Fay 20 6000 6461.83178 24000 2100 691416 107
114 Raphaely 30 11000 6461.83178 24000 2100 691416 107
119 Colmenares 30 2500 6461.83178 24000 2100 691416 107
115 Khoo 30 3100 6461.83178 24000 2100 691416 107
116 Baida 30 2900 6461.83178 24000 2100 691416 107
117 Tobias 30 2800 6461.83178 24000 2100 691416 107
118 Himuro 30 2600 6461.83178 24000 2100 691416 107
203 Mavris 40 6500 6461.83178 24000 2100 691416 107
198 OConnell 50 2600 6461.83178 24000 2100 691416 107
......
- 查詢人員資訊以及各部門平均薪水,最小薪水,最大薪水,薪水總計以及人數
select employee_id,last_name,department_id,salary,
avg(salary) over (partition by department_id) avg_sal,
max(salary) over (partition by department_id) max_sal,
min(salary) over (partition by department_id) min_sal,
sum(salary) over (partition by department_id) sum_sal,
count(salary) over (partition by department_id) count_sal
from employees order by department_id;
EMPLOYEE_ID LAST_NAME DEPARTMENT_ID SALARY AVG_SAL MAX_SAL MIN_SAL SUM_SAL COUNT_SAL
----------- --------------- ------------- ---------- ---------- ---------- ---------- ---------- ----------
200 Whalen 10 4400 4400 4400 4400 4400 1
201 Hartstein 20 13000 9500 13000 6000 19000 2
202 Fay 20 6000 9500 13000 6000 19000 2
114 Raphaely 30 11000 4150 11000 2500 24900 6
119 Colmenares 30 2500 4150 11000 2500 24900 6
115 Khoo 30 3100 4150 11000 2500 24900 6
116 Baida 30 2900 4150 11000 2500 24900 6
117 Tobias 30 2800 4150 11000 2500 24900 6
118 Himuro 30 2600 4150 11000 2500 24900 6
203 Mavris 40 6500 6500 6500 6500 6500 1
198 OConnell 50 2600 3475.55556 8200 2100 156400 45
......
- 查詢部門最高薪水的員工資訊(不使用分析函數)
select employee_id,last_name,e1.department_id,job_id,salary
from employees e1
where e1.salary=(select max(salary) from employees e2 where e1.department_id=e2.department_id)
order by department_id;
EMPLOYEE_ID LAST_NAME DEPARTMENT_ID JOB_ID SALARY
----------- --------------- ------------- ---------- ----------
200 Whalen 10 AD_ASST 4400
201 Hartstein 20 MK_MAN 13000
114 Raphaely 30 PU_MAN 11000
203 Mavris 40 HR_REP 6500
121 Fripp 50 ST_MAN 8200
103 Hunold 60 IT_PROG 9000
204 Baer 70 PR_REP 10000
145 Russell 80 SA_MAN 14000
100 King 90 AD_PRES 24000
108 Greenberg 100 FI_MGR 12008
205 Higgins 110 AC_MGR 12008
11 rows selected.
Execution Plan
----------------------------------------------------------
Plan hash value: 298340369
---------------------------------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time |
---------------------------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 1 | 44 | 5 (20)| 00:00:01 |
| 1 | SORT ORDER BY | | 1 | 44 | 5 (20)| 00:00:01 |
| 2 | NESTED LOOPS | | 1 | 44 | 5 (20)| 00:00:01 |
| 3 | NESTED LOOPS | | 10 | 44 | 5 (20)| 00:00:01 |
| 4 | VIEW | VW_SQ_1 | 1 | 16 | 4 (25)| 00:00:01 |
|* 5 | FILTER | | | | | |
| 6 | HASH GROUP BY | | 1 | 7 | 4 (25)| 00:00:01 |
| 7 | TABLE ACCESS FULL | EMPLOYEES | 107 | 749 | 3 (0)| 00:00:01 |
|* 8 | INDEX RANGE SCAN | EMP_DEPARTMENT_IX | 10 | | 0 (0)| 00:00:01 |
|* 9 | TABLE ACCESS BY INDEX ROWID| EMPLOYEES | 1 | 28 | 1 (0)| 00:00:01 |
---------------------------------------------------------------------------------------------------
Predicate Information (identified by operation id):
---------------------------------------------------
5 - filter(MAX("SALARY")>0)
8 - access("E1"."DEPARTMENT_ID"="ITEM_1")
9 - filter("E1"."SALARY"="MAX(SALARY)")
Statistics
----------------------------------------------------------
0 recursive calls
0 db block gets
18 consistent gets
0 physical reads
0 redo size
1178 bytes sent via SQL*Net to client
520 bytes received via SQL*Net from client
2 SQL*Net roundtrips to/from client
1 sorts (memory)
0 sorts (disk)
11 rows processed
- 查詢部門最高薪水的員工資訊(使用分析函數)
select emp.*
from (select employee_id,last_name,department_id,job_id,salary,
max(salary) over (partition by department_id) max_sal
from employees
order by department_id) emp
where salary=max_sal
order by department_id;
EMPLOYEE_ID LAST_NAME DEPARTMENT_ID JOB_ID SALARY MAX_SAL
----------- --------------- ------------- ---------- ---------- ----------
200 Whalen 10 AD_ASST 4400 4400
201 Hartstein 20 MK_MAN 13000 13000
114 Raphaely 30 PU_MAN 11000 11000
203 Mavris 40 HR_REP 6500 6500
121 Fripp 50 ST_MAN 8200 8200
103 Hunold 60 IT_PROG 9000 9000
204 Baer 70 PR_REP 10000 10000
145 Russell 80 SA_MAN 14000 14000
100 King 90 AD_PRES 24000 24000
108 Greenberg 100 FI_MGR 12008 12008
205 Higgins 110 AC_MGR 12008 12008
EMPLOYEE_ID LAST_NAME DEPARTMENT_ID JOB_ID SALARY MAX_SAL
----------- --------------- ------------- ---------- ---------- ----------
178 Grant SA_REP 7000 7000
12 rows selected.
Execution Plan
----------------------------------------------------------
Plan hash value: 720055818
---------------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time |
---------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 107 | 6848 | 3 (0)| 00:00:01 |
|* 1 | VIEW | | 107 | 6848 | 3 (0)| 00:00:01 |
| 2 | WINDOW SORT | | 107 | 2996 | 3 (0)| 00:00:01 |
| 3 | TABLE ACCESS FULL| EMPLOYEES | 107 | 2996 | 3 (0)| 00:00:01 |
---------------------------------------------------------------------------------
Predicate Information (identified by operation id):
---------------------------------------------------
1 - filter("SALARY"="MAX_SAL")
Statistics
----------------------------------------------------------
1 recursive calls
0 db block gets
6 consistent gets
0 physical reads
0 redo size
1312 bytes sent via SQL*Net to client
520 bytes received via SQL*Net from client
2 SQL*Net roundtrips to/from client
1 sorts (memory)
0 sorts (disk)
12 rows processed
可以看到使用分析函數的SQL性能有一定提升。
- 查詢人員資訊以及各部門各職位薪水總計和各部門薪水總計
select employee_id,last_name,department_id,job_id,salary,
sum(salary) over (partition by department_id,job_id) job_sal1,
sum(salary) over (partition by department_id) dept_sal2
from employees
order by department_id;
EMPLOYEE_ID LAST_NAME DEPARTMENT_ID JOB_ID SALARY JOB_SAL1 DEPT_SAL2
----------- --------------- ------------- ---------- ---------- ---------- ----------
200 Whalen 10 AD_ASST 4400 4400 4400
201 Hartstein 20 MK_MAN 13000 13000 19000
202 Fay 20 MK_REP 6000 6000 19000
118 Himuro 30 PU_CLERK 2600 13900 24900
119 Colmenares 30 PU_CLERK 2500 13900 24900
115 Khoo 30 PU_CLERK 3100 13900 24900
116 Baida 30 PU_CLERK 2900 13900 24900
117 Tobias 30 PU_CLERK 2800 13900 24900
114 Raphaely 30 PU_MAN 11000 11000 24900
203 Mavris 40 HR_REP 6500 6500 6500
198 OConnell 50 SH_CLERK 2600 64300 156400
......
- 查詢各部門各職位薪水總計以及各部門薪水總計
select department_id,job_id,
sum(salary) job_sal1,
sum(sum(salary)) over (partition by department_id) dept_sal2
from employees
group by department_id,job_id
order by department_id;
DEPARTMENT_ID JOB_ID JOB_SAL1 DEPT_SAL2
------------- ---------- ---------- ----------
10 AD_ASST 4400 4400
20 MK_MAN 13000 19000
20 MK_REP 6000 19000
30 PU_CLERK 13900 24900
30 PU_MAN 11000 24900
40 HR_REP 6500 6500
50 SH_CLERK 64300 156400
50 ST_CLERK 55700 156400
50 ST_MAN 36400 156400
60 IT_PROG 28800 28800
70 PR_REP 10000 10000
DEPARTMENT_ID JOB_ID JOB_SAL1 DEPT_SAL2
------------- ---------- ---------- ----------
80 SA_MAN 61000 304500
80 SA_REP 243500 304500
90 AD_PRES 24000 58000
90 AD_VP 34000 58000
100 FI_ACCOUNT 39600 51608
100 FI_MGR 12008 51608
110 AC_ACCOUNT 8300 20308
110 AC_MGR 12008 20308
SA_REP 7000 7000
20 rows selected.
- 查詢各職位薪水總計占所在部門薪水總計超過50%的職位
select emp.*,100 * round(job_sal1/dept_sal2, 2)||'%' Percent
from (select department_id,job_id,
sum(salary) job_sal1,
sum(sum(salary)) over (partition by department_id) dept_sal2
from employees
group by department_id,job_id) emp
where job_sal1>dept_sal2*0.5;
DEPARTMENT_ID JOB_ID JOB_SAL1 DEPT_SAL2 PERCENT
------------- ---------- ---------- ---------- -----------------------------------------
10 AD_ASST 4400 4400 100%
20 MK_MAN 13000 19000 68%
30 PU_CLERK 13900 24900 56%
40 HR_REP 6500 6500 100%
60 IT_PROG 28800 28800 100%
70 PR_REP 10000 10000 100%
80 SA_REP 243500 304500 80%
90 AD_VP 34000 58000 59%
100 FI_ACCOUNT 39600 51608 77%
110 AC_MGR 12008 20308 59%
SA_REP 7000 7000 100%
11 rows selected.
- 查詢各職位薪水總計占所在部門薪水總計超過50%的職位(使用ratio_to_report函數)
select emp.*
from (select department_id,job_id,
sum(salary) job_sal1,
sum(sum(salary)) over (partition by department_id) dept_sal2,
ratio_to_report(sum(salary)) over (partition by department_id) job_to_dept_sal3
from employees
group by department_id,job_id) emp
where job_to_dept_sal3>0.5;
DEPARTMENT_ID JOB_ID JOB_SAL1 DEPT_SAL2 JOB_TO_DEPT_SAL3
------------- ---------- ---------- ---------- ----------------
10 AD_ASST 4400 4400 1
20 MK_MAN 13000 19000 .684210526
30 PU_CLERK 13900 24900 .558232932
40 HR_REP 6500 6500 1
60 IT_PROG 28800 28800 1
70 PR_REP 10000 10000 1
80 SA_REP 243500 304500 .799671593
90 AD_VP 34000 58000 .586206897
100 FI_ACCOUNT 39600 51608 .767322896
110 AC_MGR 12008 20308 .591294071
SA_REP 7000 7000 1
11 rows selected.
- 查詢每個人的薪水占部門薪水合計及公司薪水總計的百分比(使用ratio_to_report函數)
select employee_id,last_name,department_id,hire_date,salary,
ratio_to_report(salary) over(partition by department_id) as pct1,
ratio_to_report(salary) over() as pct2
from employees;
EMPLOYEE_ID LAST_NAME DEPARTMENT_ID HIRE_DATE SALARY PCT1 PCT2
----------- --------------- ------------- ------------------ ---------- ---------- ----------
200 Whalen 10 17-SEP-03 4400 1 .006363752
201 Hartstein 20 17-FEB-04 13000 .684210526 .018801995
202 Fay 20 17-AUG-05 6000 .315789474 .008677844
114 Raphaely 30 07-DEC-02 11000 .441767068 .01590938
119 Colmenares 30 10-AUG-07 2500 .100401606 .003615768
115 Khoo 30 18-MAY-03 3100 .124497992 .004483553
116 Baida 30 24-DEC-05 2900 .116465863 .004194291
117 Tobias 30 24-JUL-05 2800 .112449799 .00404966
118 Himuro 30 15-NOV-06 2600 .104417671 .003760399
203 Mavris 40 07-JUN-02 6500 1 .009400997
198 OConnell 50 21-JUN-07 2600 .016624041 .003760399
......
Windowing
-
Cumulative Aggregate Function
- 查詢按部門的薪水合計及公司薪水總計
select employee_id,last_name,department_id,salary, sum(salary) over (partition by department_id order by department_id ROWS BETWEEN UNBOUNDED PRECEDING AND UNBOUNDED FOLLOWING) dept_sal_cum1, sum(salary) over (order by department_id ROWS BETWEEN UNBOUNDED PRECEDING AND UNBOUNDED FOLLOWING) dept_sal_cum2 from employees; EMPLOYEE_ID LAST_NAME DEPARTMENT_ID SALARY DEPT_SAL_CUM1 DEPT_SAL_CUM2 ----------- --------------- ------------- ---------- ------------- ------------- 200 Whalen 10 4400 4400 691416 201 Hartstein 20 13000 19000 691416 202 Fay 20 6000 19000 691416 114 Raphaely 30 11000 24900 691416 119 Colmenares 30 2500 24900 691416 115 Khoo 30 3100 24900 691416 116 Baida 30 2900 24900 691416 117 Tobias 30 2800 24900 691416 118 Himuro 30 2600 24900 691416 203 Mavris 40 6500 6500 691416 198 OConnell 50 2600 156400 691416 ......
和以下SQL等價:
select employee_id,last_name,department_id,salary, sum(salary) over (partition by department_id) dept_sal_cum1, sum(salary) over () dept_sal_cum2 from employees;
- 查詢按部門的薪水累計及不按部門的薪水累計
select employee_id,last_name,department_id,salary, sum(salary) over (partition by department_id order by department_id ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW) dept_sal_cum1, sum(salary) over (order by department_id ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW) dept_sal_cum2 from employees; EMPLOYEE_ID LAST_NAME DEPARTMENT_ID SALARY DEPT_SAL_CUM1 DEPT_SAL_CUM2 ----------- --------------- ------------- ---------- ------------- ------------- 200 Whalen 10 4400 4400 4400 201 Hartstein 20 13000 13000 17400 202 Fay 20 6000 19000 23400 114 Raphaely 30 11000 11000 34400 119 Colmenares 30 2500 13500 36900 115 Khoo 30 3100 16600 40000 116 Baida 30 2900 19500 42900 117 Tobias 30 2800 22300 45700 118 Himuro 30 2600 24900 48300 203 Mavris 40 6500 6500 54800 198 OConnell 50 2600 2600 57400 ......
和以下SQL等價:
select employee_id,last_name,department_id,salary, sum(salary) over (partition by department_id order by department_id ROWS UNBOUNDED PRECEDING) dept_sal_cum1, sum(salary) over (order by department_id ROWS UNBOUNDED PRECEDING) dept_sal_cum2 from employees;
- 查詢按部門分區從分區第一行到本行前一行的累計和到本行後一行的累計
select employee_id,last_name,department_id,salary, sum(salary) over (partition by department_id order by department_id ROWS BETWEEN UNBOUNDED PRECEDING AND 1 PRECEDING) dept_sal_cum1, sum(salary) over (partition by department_id order by department_id ROWS BETWEEN UNBOUNDED PRECEDING AND 1 FOLLOWING) dept_sal_cum2 from employees; EMPLOYEE_ID LAST_NAME DEPARTMENT_ID SALARY DEPT_SAL_CUM1 DEPT_SAL_CUM2 ----------- --------------- ------------- ---------- ------------- ------------- 200 Whalen 10 4400 4400 201 Hartstein 20 13000 19000 202 Fay 20 6000 13000 19000 114 Raphaely 30 11000 13500 119 Colmenares 30 2500 11000 16600 115 Khoo 30 3100 13500 19500 116 Baida 30 2900 16600 22300 117 Tobias 30 2800 19500 24900 118 Himuro 30 2600 22300 24900 203 Mavris 40 6500 6500 198 OConnell 50 2600 5200 ......
-
Moving Aggregate Function
- 查詢按部門分區從分區前一行到本行的累計以及到本行到後一行的累計
select employee_id,last_name,department_id,salary, sum(salary) over (partition by department_id order by department_id ROWS BETWEEN 1 PRECEDING AND CURRENT ROW) dept_sal_cum1, sum(salary) over (partition by department_id order by department_id ROWS BETWEEN CURRENT ROW AND 1 FOLLOWING) dept_sal_cum2 from employees; EMPLOYEE_ID LAST_NAME DEPARTMENT_ID SALARY DEPT_SAL_CUM1 DEPT_SAL_CUM2 ----------- --------------- ------------- ---------- ------------- ------------- 200 Whalen 10 4400 4400 4400 201 Hartstein 20 13000 13000 19000 202 Fay 20 6000 19000 6000 114 Raphaely 30 11000 11000 13500 119 Colmenares 30 2500 13500 5600 115 Khoo 30 3100 5600 6000 116 Baida 30 2900 6000 5700 117 Tobias 30 2800 5700 5400 118 Himuro 30 2600 5400 2600 203 Mavris 40 6500 6500 6500 198 OConnell 50 2600 2600 5200 ......
-
Centered Aggregate
- 查詢按照入職日期分組的薪水合計,以及入職日期相鄰1天的人員的平均薪水
SELECT hire_date, SUM(salary) AS sum_sal1, AVG(SUM(salary)) OVER (ORDER BY hire_date RANGE BETWEEN INTERVAL '1' DAY PRECEDING AND INTERVAL '1' DAY FOLLOWING) AS CENTERED_1_DAY_AVG FROM employees GROUP BY hire_date; HIRE_DATE SUM_SAL1 CENTERED_1_DAY_AVG ------------------ ---------- ------------------ 13-JAN-01 17000 17000 07-JUN-02 36808 36808 16-AUG-02 9000 10504 17-AUG-02 12008 10504 07-DEC-02 11000 11000 01-MAY-03 7900 7900 18-MAY-03 3100 3100 17-JUN-03 24000 24000 14-JUL-03 3600 3600 17-SEP-03 4400 4400 17-OCT-03 3500 3500 ......
Ranking
-
RANK:返回一個唯一的值,除非遇到相同的數據時,此時所有相同數據的排名是一樣的,同時會在最後一條相同記錄和下一條不同記錄的排名之間空出排名
-
DENSE_RANK:返回一個唯一的值,除非當碰到相同數據時,此時所有相同數據的排名都是一樣的。
-
ROW_NUMBER:返回一個唯一的值,當碰到相同數據時,排名按照記錄集中記錄的順序依次遞增。
-
查詢按部門的薪水從低到高排名人員資訊
select employee_id,last_name,department_id,salary,
RANK() over (partition by department_id order by salary) rank,
DENSE_RANK() over (partition by department_id order by salary) dense_rank,
ROW_NUMBER() over (partition by department_id order by salary) row_number
from employees where department_id=50;
EMPLOYEE_ID LAST_NAME DEPARTMENT_ID SALARY RANK DENSE_RANK ROW_NUMBER
----------- --------------- ------------- ---------- ---------- ---------- ----------
132 Olson 50 2100 1 1 1
128 Markle 50 2200 2 2 2
136 Philtanker 50 2200 2 2 3
135 Gee 50 2400 4 3 4
127 Landry 50 2400 4 3 5
131 Marlow 50 2500 6 4 6
144 Vargas 50 2500 6 4 7
182 Sullivan 50 2500 6 4 8
191 Perkins 50 2500 6 4 9
140 Patel 50 2500 6 4 10
198 OConnell 50 2600 11 5 11
......
- 查詢每個部門的薪水排名前三名人員資訊
select e.*
from (select employee_id,last_name,department_id,salary,
DENSE_RANK() over (partition by department_id order by salary desc) dense_rank
from employees) e
where dense_rank<=3;
EMPLOYEE_ID LAST_NAME DEPARTMENT_ID SALARY DENSE_RANK
----------- --------------- ------------- ---------- ----------
200 Whalen 10 4400 1
201 Hartstein 20 13000 1
202 Fay 20 6000 2
114 Raphaely 30 11000 1
115 Khoo 30 3100 2
116 Baida 30 2900 3
203 Mavris 40 6500 1
121 Fripp 50 8200 1
120 Weiss 50 8000 2
122 Kaufling 50 7900 3
103 Hunold 60 9000 1
......
LAG/LEAD
- 語法
{LAG | LEAD} ( value_expr [, offset] [, default] ) [RESPECT NULLS|IGNORE NULLS] OVER ( [query_partition_clause] order_by_clause )
lag 和lead函數可以獲取結果集中,按一定排序所排列的當前行的上下相鄰若干offset 的某個行的某個列(不用結果集的自關聯);lag,lead分別是向前,向後;lag 和lead 有三個參數,第一個參數是列名,第二個參數是偏移的offset,第三個參數是超出記錄窗口時的默認值)。lag(expression<,offset><,default>)函數可以訪問組內當前行之前的行,而lead(expression<,offset><,default>)函數則正相反,可以訪問組內當前行之後的行。其中,offset是正整數,默認為1.因組內第一個條記錄沒有之前的行,最後一行沒有之後的行,default就是用於處理這樣的資訊,默認為空.注意:這2個函數必須指定 order By 字句。
- 查詢人員薪水及其前面入職人員的薪水和後面入職人員的薪水
SELECT hire_date, last_name, salary,
LAG(salary, 1, 0 ) OVER (ORDER BY hire_date) AS prev_sal,
LEAD(salary, 1, 0 ) OVER (ORDER BY hire_date) AS next_sal
FROM employees
WHERE job_id = 'PU_CLERK'
ORDER BY hire_date;
HIRE_DATE LAST_NAME SALARY PREV_SAL NEXT_SAL
------------------ --------------- ---------- ---------- ----------
18-MAY-03 Khoo 3100 0 2800
24-JUL-05 Tobias 2800 3100 2900
24-DEC-05 Baida 2900 2800 2600
15-NOV-06 Himuro 2600 2900 2500
10-AUG-07 Colmenares 2500 2600 0
FIRST/LAST
- 語法
aggregate_function KEEP ( DENSE_RANK LAST ORDER BY expr [ DESC | ASC ] [NULLS { FIRST | LAST }] [, expr [ DESC | ASC ] [NULLS { FIRST | LAST }]]…) [OVER query_partitioning_clause]
first/last函數允許我們對某數據集進行排序,並對排序結果的第一條記錄和最後一條記錄進行處理。在查詢出第一條或者最後一條記錄後,我們需要應用一個聚合函數來處理特定列,這是為了保證返回結果的唯一性,因為排名第一的記錄和排名最後的記錄可能會存在多個。使用first/last函數可以避免自連接或者子查詢,因此可以提高處理效率。
-
使用說明
- first和last函數有over子句就是分析函數,沒有就是聚合函數。
- 函數的參數必須是數字類型(或者其他類型可轉為數字類型),返回相同類型
- aggregate_function可以是MIN,MAX,SUM,AVG,COUNT,VARIANCE,STDDEV
-
查詢人員資訊及其所在部門的最低和最高薪水
SELECT employee_id, last_name, department_id, salary,
MIN(salary) KEEP (DENSE_RANK FIRST ORDER BY salary) OVER (PARTITION BY department_id) "Worst",
MAX(salary) KEEP (DENSE_RANK LAST ORDER BY salary) OVER (PARTITION BY department_id) "Best"
FROM employees
ORDER BY department_id, salary, last_name;
EMPLOYEE_ID LAST_NAME DEPARTMENT_ID SALARY Worst Best
----------- --------------- ------------- ---------- ---------- ----------
200 Whalen 10 4400 4400 4400
202 Fay 20 6000 6000 13000
201 Hartstein 20 13000 6000 13000
119 Colmenares 30 2500 2500 11000
118 Himuro 30 2600 2500 11000
117 Tobias 30 2800 2500 11000
116 Baida 30 2900 2500 11000
115 Khoo 30 3100 2500 11000
114 Raphaely 30 11000 2500 11000
203 Mavris 40 6500 6500 6500
132 Olson 50 2100 2100 8200
NTILE
- 語法
NTILE (expr) OVER ([query_partition_clause] order_by_clause)
- 查詢人員資訊及其對應的薪水等級,將薪水分為5個等級
SELECT employee_id,last_name,salary,
NTILE(5) OVER (ORDER BY salary DESC) AS quartile
FROM employees
WHERE department_id=30;
EMPLOYEE_ID LAST_NAME SALARY QUARTILE
----------- ------------------------- ---------- ----------
114 Raphaely 11000 1
115 Khoo 3100 1
116 Baida 2900 2
117 Tobias 2800 3
118 Himuro 2600 4
119 Colmenares 2500 5
FIRST_VALUE/LAST_VALUE
- 語法
FIRST_VALUE|LAST_VALUE (
- 查詢人員資訊及其所在部門最低薪水和最高薪水人員姓名
SELECT employee_id,last_name,department_id,salary,
FIRST_VALUE(last_name) OVER (PARTITION BY department_id ORDER BY salary) AS worst,
LAST_VALUE(last_name) OVER (PARTITION BY department_id ORDER BY salary ROWS BETWEEN UNBOUNDED PRECEDING AND UNBOUNDED FOLLOWING) AS best
FROM employees order by department_id,salary;
EMPLOYEE_ID LAST_NAME DEPARTMENT_ID SALARY WORST BEST
----------- -------------------- ------------- ---------- -------------------- --------------------
200 Whalen 10 4400 Whalen Whalen
202 Fay 20 6000 Fay Hartstein
201 Hartstein 20 13000 Fay Hartstein
119 Colmenares 30 2500 Colmenares Raphaely
118 Himuro 30 2600 Colmenares Raphaely
117 Tobias 30 2800 Colmenares Raphaely
116 Baida 30 2900 Colmenares Raphaely
115 Khoo 30 3100 Colmenares Raphaely
114 Raphaely 30 11000 Colmenares Raphaely
203 Mavris 40 6500 Mavris Mavris
132 Olson 50 2100 Olson Fripp
......
LISTAGG
- 語法
LISTAGG (
- 查詢每個部門所有人員姓名並按照薪水從低到高排序
select department_id,
listagg(last_name,',') within group (order by salary) name
from employees where department_id in (10,20,30) group by department_id;
DEPARTMENT_ID NAME
------------- --------------------------------------------------
10 Whalen
20 Fay,Hartstein
30 Colmenares,Himuro,Tobias,Baida,Khoo,Raphaely
select department_id,last_name,salary,
listagg(last_name,',') within group (order by salary) over (partition by department_id) name
from employees where department_id in (10,20,30);
DEPARTMENT_ID LAST_NAME SALARY NAME
------------- -------------------- ---------- --------------------------------------------------
10 Whalen 4400 Whalen
20 Fay 6000 Fay,Hartstein
20 Hartstein 13000 Fay,Hartstein
30 Colmenares 2500 Colmenares,Himuro,Tobias,Baida,Khoo,Raphaely
30 Himuro 2600 Colmenares,Himuro,Tobias,Baida,Khoo,Raphaely
30 Tobias 2800 Colmenares,Himuro,Tobias,Baida,Khoo,Raphaely
30 Baida 2900 Colmenares,Himuro,Tobias,Baida,Khoo,Raphaely
30 Khoo 3100 Colmenares,Himuro,Tobias,Baida,Khoo,Raphaely
30 Raphaely 11000 Colmenares,Himuro,Tobias,Baida,Khoo,Raphaely
CUME_DIST
- 語法
CUME_DIST ( ) OVER ( [query_partition_clause] order_by_clause )
- 計算每個人在本部門按照薪水排列中的相對位置
SELECT employee_id,last_name,department_id,salary,
CUME_DIST() OVER (PARTITION BY department_id ORDER BY salary) AS cume_dist
FROM employees
WHERE department_id=30;
EMPLOYEE_ID LAST_NAME DEPARTMENT_ID SALARY CUME_DIST
----------- -------------------- ------------- ---------- ----------
119 Colmenares 30 2500 .166666667
118 Himuro 30 2600 .333333333
117 Tobias 30 2800 .5
116 Baida 30 2900 .666666667
115 Khoo 30 3100 .833333333
114 Raphaely 30 11000 1
PERCENT_RANK
- 語法
PERCENT_RANK () OVER ([query_partition_clause] order_by_clause)
- 計算每個人在本部門按照薪水排列中的相對位置
SELECT department_id,last_name,salary,
PERCENT_RANK() OVER (PARTITION BY department_id ORDER BY salary) AS pr
FROM employees
WHERE department_id=30;
DEPARTMENT_ID LAST_NAME SALARY PR
------------- -------------------- ---------- ----------
30 Colmenares 2500 0
30 Himuro 2600 .2
30 Tobias 2800 .4
30 Baida 2900 .6
30 Khoo 3100 .8
30 Raphaely 11000 1
Hypothetical Rank
- 語法
[RANK | DENSE_RANK | PERCENT_RANK | CUME_DIST]( constant expression [, …] ) WITHIN GROUP ( ORDER BY order by expression [ASC|DESC] [NULLS FIRST|NULLS LAST][, …] )
- 假如50部門新來一位工資4000的員工,計算該員工在50部門薪水的位置
select
RANK(50,4000) within group (order by department_id, salary) rank,
DENSE_RANK(50,4000) within group (order by department_id, salary) dense_rank,
PERCENT_RANK(50,4000) within group (order by department_id, salary) percent_rank,
cume_dist(50,4000) within group (order by department_id, salary) cume_dist
from employees where department_id=50;
RANK DENSE_RANK PERCENT_RANK CUME_DIST
---------- ---------- ------------ ----------
38 18 .822222222 .847826087
歡迎關注我的公眾號,一起學習。