Hive SQL開窗函數實戰
- 2019 年 10 月 6 日
- 筆記
Hive SQL開窗函數實戰
開窗函數是數據的一種查詢統計語法糖,多是用於離線統計,這同時也是大數據技術棧的應用場景。今天學習Hive SQL的開窗(窗口)函數,對比與MySQL,在MySQL8.0以前是不支援窗口函數的,而Oracle支援。
下面進入實戰練習模式:
數據欄位描述:
class student score math a 20
第一列為課程類型,比如math,第二列為學生的學號,假設是a、b、c、d這種,第三列為學生得分。下面對這個數據進行表構建與數據導入。
創建表及導入數據:
hive (cookie)> create database if not exists math; OK Time taken: 0.31 seconds hive (cookie)> use math; OK Time taken: 0.01 seconds hive (math)> drop table if exists math; OK Time taken: 0.007 seconds hive (math)> create table math(class string,student string,score int) row format delimited fields terminated by ','; OK Time taken: 0.233 seconds hive (math)> load data local inpath '/home/light/mysql/hive/math.txt' into table math; Loading data to table math.math OK Time taken: 0.405 seconds hive (math)> select * from math; OK math.class math.student math.score math a 20 math b 70 math c 100 math d 30 math e 60 math f 50 math g 50 Time taken: 0.17 seconds, Fetched: 7 row(s)
執行下面查詢與語句:
select class, student, score, sum(score) over (partition by class order by student rows between unbounded preceding and current row) as score1, sum(score) over (partition by class order by student) as score2, sum(score) over (partition by class) as score3, sum(score) over (partition by class order by student rows between 3 preceding and current row) as score4, sum(score) over (partition by class order by student rows between 3 preceding and 1 following) as score5, sum(score) over (partition by class order by student rows between current row and unbounded following) as score6 from math;
上述的程式碼解釋:
如果不指定ROWS BETWEEN,默認為從起點到當前行;如果不指定ORDER BY,則將分組內所有值累加;
ROWS BETWEEN含義,也叫做WINDOW子句(這就是開窗!)
PRECEDING:往前;FOLLOWING:往後;CURRENT ROW:當前行
UNBOUNDED:起點,
UNBOUNDED PRECEDING 表示從前面的起點,
UNBOUNDED FOLLOWING:表示到後面的終點
結果:

結果描述:
score1:分組內從起點到當前行的score累積,例如c.score=a.score+b.score+c.score
score2:同score1。
score3:分組內(class)所有的score累加。
score4:分組內當前行+往前3行,如:e.score=b.score+c.score+d.score+e.score。(對應下表的加粗數字)。
class |
student |
score |
score4 |
---|---|---|---|
math |
g |
50 |
190 |
math |
f |
50 |
240 |
math |
e |
60 |
260 |
math |
d |
30 |
220 |
math |
c |
100 |
190 |
math |
b |
70 |
90 |
math |
a |
20 |
20 |
score5:分組內當前行+往前3行+往後1行,如:還是求上述e.score,那麼就是在前面基礎上加上後一行分數。(對應下表的加粗數字)。
class |
student |
score |
score5 |
---|---|---|---|
math |
g |
50 |
190 |
math |
f |
50 |
290 |
math |
e |
60 |
310 |
math |
d |
30 |
280 |
math |
c |
100 |
220 |
math |
b |
70 |
190 |
math |
a |
20 |
90 |
score6:分組內當前行+往後所有行,如:還是求上述e.score(對應下表的加粗數字)。
class |
student |
score |
score6 |
---|---|---|---|
math |
g |
50 |
50 |
math |
f |
50 |
100 |
math |
e |
60 |
160 |
math |
d |
30 |
190 |
math |
c |
100 |
290 |
math |
b |
70 |
360 |
math |
a |
20 |
380 |
其餘的開窗函數,比如:min、avg、max與此類似。就不在贅述。