Hive SQL開窗函數實戰

  • 2019 年 10 月 6 日
  • 筆記

Hive SQL開窗函數實戰

開窗函數是數據的一種查詢統計語法糖,多是用於離線統計,這同時也是大數據技術棧的應用場景。今天學習Hive SQL的開窗(窗口)函數,對比與MySQL,在MySQL8.0以前是不支援窗口函數的,而Oracle支援。

下面進入實戰練習模式:

數據欄位描述:

class    student score  math       a      20  

第一列為課程類型,比如math,第二列為學生的學號,假設是a、b、c、d這種,第三列為學生得分。下面對這個數據進行表構建與數據導入。

創建表及導入數據:

hive (cookie)> create database if not exists math;  OK  Time taken: 0.31 seconds  hive (cookie)> use math;  OK  Time taken: 0.01 seconds  hive (math)> drop table if exists math;  OK  Time taken: 0.007 seconds  hive (math)> create table math(class string,student string,score int) row format delimited fields terminated by ',';  OK  Time taken: 0.233 seconds  hive (math)> load data local inpath '/home/light/mysql/hive/math.txt' into table math;  Loading data to table math.math  OK  Time taken: 0.405 seconds  hive (math)> select * from math;  OK  math.class    math.student    math.score  math    a   20  math    b   70  math    c   100  math    d   30  math    e   60  math    f   50  math    g   50  Time taken: 0.17 seconds, Fetched: 7 row(s)  

執行下面查詢與語句:

select     class,     student,     score,     sum(score) over (partition by class order by student rows between unbounded preceding and current row) as score1,     sum(score) over (partition by class order by student) as score2,     sum(score) over (partition by class) as score3,     sum(score) over (partition by class order by student rows between 3 preceding and current row) as score4,     sum(score) over (partition by class order by student rows between 3 preceding and 1 following) as score5,     sum(score) over (partition by class order by student rows between current row and unbounded following) as score6  from math;  

上述的程式碼解釋:

如果不指定ROWS BETWEEN,默認為從起點到當前行;如果不指定ORDER BY,則將分組內所有值累加;

ROWS BETWEEN含義,也叫做WINDOW子句(這就是開窗!)

PRECEDING:往前;FOLLOWING:往後;CURRENT ROW:當前行

UNBOUNDED:起點,

  UNBOUNDED PRECEDING 表示從前面的起點,

  UNBOUNDED FOLLOWING:表示到後面的終點

結果:

結果描述:

score1:分組內從起點到當前行的score累積,例如c.score=a.score+b.score+c.score

score2:同score1。

score3:分組內(class)所有的score累加。

score4:分組內當前行+往前3行,如:e.score=b.score+c.score+d.score+e.score。(對應下表的加粗數字)。

class

student

score

score4

math

g

50

190

math

f

50

240

math

e

60

260

math

d

30

220

math

c

100

190

math

b

70

90

math

a

20

20

score5:分組內當前行+往前3行+往後1行,如:還是求上述e.score,那麼就是在前面基礎上加上後一行分數。(對應下表的加粗數字)。

class

student

score

score5

math

g

50

190

math

f

50

290

math

e

60

310

math

d

30

280

math

c

100

220

math

b

70

190

math

a

20

90

score6:分組內當前行+往後所有行,如:還是求上述e.score(對應下表的加粗數字)。

class

student

score

score6

math

g

50

50

math

f

50

100

math

e

60

160

math

d

30

190

math

c

100

290

math

b

70

360

math

a

20

380

其餘的開窗函數,比如:min、avg、max與此類似。就不在贅述。