­

Hive 入門 Group By 全案例【附代碼】

  • 2019 年 12 月 25 日
  • 筆記

不明就裡的讀者可以看上一篇:

Hive 的入門級 Group By 全案例

昨晚發文之後,有讀者陸陸續續在星球發問了,腳本到底該怎麼寫?

當然也有星友在第一時間拿出了自己的方案,工工整整,讓我好生欽佩。

不廢話了,下面是大家想看的具體實現。

環境:

Hive: 2.7.7  Oracle SQL Developer  Cloudera JDBC Driver  

案例 – 1 : Group by 的常規化應用

select  schema_id      ,   type_desc      ,   count(object_id) as object_count  from tblobj2  group by schema_id,type_desc  

結果:

image

案例 – 2 : Group by 之 Grouping Sets 應用

select  schema_id      ,   type_desc      ,   count(object_id) as object_count  from tblobj2  group by schema_id,type_desc  grouping sets((schema_id,type_desc),schema_id)  

結果:

image

select  schema_id      ,   type_desc      ,   count(object_id) as object_count  from tblobj2  group by schema_id,type_desc  grouping sets((schema_id,type_desc),type_desc)  

結果:

image

select  schema_id      ,   type_desc      ,   count(object_id) as object_count  from tblobj2  group by schema_id,type_desc  grouping sets((schema_id,type_desc),type_desc,())  order by schema_id ,type_desc  

結果:

image

select  schema_id      ,   type_desc      ,   count(object_id) as object_count  from tblobj2  group by schema_id,type_desc  grouping sets(schema_id,type_desc,())  order by schema_id ,type_desc  

結果:

image

結論:

grouping sets 的作用就是將選定的分組字段,再分子組進行匯總。

(schema_id,type_desc) 用來指定細分字段組合;

單個字段,比如 schema_id, type_desc 用來指定細分的單個字段;

()用來計算總和,總計等,目標對象是符合條件的所有數據,即相當於沒有使用字段做 group by 的聚合計算。

最終將這些 grouping sets 裏面指定的細分字段聚合得到的結果聯合在一個結果集而展現出來。

案例 – 3 : Group by 之 with cube

select schema_id      ,   type_desc      ,   count(object_id) as object_count  from tblobj2  group by schema_id,type_desc  with cube  order by schema_id ,type_desc  

結果 :

image

相當於是以下 grouping sets 的簡化版本

select  schema_id      ,   type_desc      ,   count(object_id) as object_count  from tblobj2  group by schema_id,type_desc  grouping sets((schema_id,type_desc),schema_id,type_desc,())  order by schema_id ,type_desc  

案例 – 4 : Group by 之 with rollup

這是一個上卷的操作,唯一一個有方向性的分組聚合操作

select schema_id      ,   type_desc      ,   count(object_id) as object_count  from tblobj2  group by schema_id,type_desc  with rollup  order by schema_id ,type_desc  

image

select schema_id      ,   type_desc      ,   count(object_id) as object_count  from tblobj2  group by type_desc,schema_id  with rollup  order by schema_id ,type_desc  

image

按照分組字段從右到左的上卷匯總,最後匯總所有符合條件的數據到一個結果集。