Hive 入门 Group By 全案例【附代码】

  • 2019 年 12 月 25 日
  • 笔记

不明就里的读者可以看上一篇:

Hive 的入门级 Group By 全案例

昨晚发文之后,有读者陆陆续续在星球发问了,脚本到底该怎么写?

当然也有星友在第一时间拿出了自己的方案,工工整整,让我好生钦佩。

不废话了,下面是大家想看的具体实现。

环境:

Hive: 2.7.7  Oracle SQL Developer  Cloudera JDBC Driver  

案例 – 1 : Group by 的常规化应用

select  schema_id      ,   type_desc      ,   count(object_id) as object_count  from tblobj2  group by schema_id,type_desc  

结果:

image

案例 – 2 : Group by 之 Grouping Sets 应用

select  schema_id      ,   type_desc      ,   count(object_id) as object_count  from tblobj2  group by schema_id,type_desc  grouping sets((schema_id,type_desc),schema_id)  

结果:

image

select  schema_id      ,   type_desc      ,   count(object_id) as object_count  from tblobj2  group by schema_id,type_desc  grouping sets((schema_id,type_desc),type_desc)  

结果:

image

select  schema_id      ,   type_desc      ,   count(object_id) as object_count  from tblobj2  group by schema_id,type_desc  grouping sets((schema_id,type_desc),type_desc,())  order by schema_id ,type_desc  

结果:

image

select  schema_id      ,   type_desc      ,   count(object_id) as object_count  from tblobj2  group by schema_id,type_desc  grouping sets(schema_id,type_desc,())  order by schema_id ,type_desc  

结果:

image

结论:

grouping sets 的作用就是将选定的分组字段,再分子组进行汇总。

(schema_id,type_desc) 用来指定细分字段组合;

单个字段,比如 schema_id, type_desc 用来指定细分的单个字段;

()用来计算总和,总计等,目标对象是符合条件的所有数据,即相当于没有使用字段做 group by 的聚合计算。

最终将这些 grouping sets 里面指定的细分字段聚合得到的结果联合在一个结果集而展现出来。

案例 – 3 : Group by 之 with cube

select schema_id      ,   type_desc      ,   count(object_id) as object_count  from tblobj2  group by schema_id,type_desc  with cube  order by schema_id ,type_desc  

结果 :

image

相当于是以下 grouping sets 的简化版本

select  schema_id      ,   type_desc      ,   count(object_id) as object_count  from tblobj2  group by schema_id,type_desc  grouping sets((schema_id,type_desc),schema_id,type_desc,())  order by schema_id ,type_desc  

案例 – 4 : Group by 之 with rollup

这是一个上卷的操作,唯一一个有方向性的分组聚合操作

select schema_id      ,   type_desc      ,   count(object_id) as object_count  from tblobj2  group by schema_id,type_desc  with rollup  order by schema_id ,type_desc  

image

select schema_id      ,   type_desc      ,   count(object_id) as object_count  from tblobj2  group by type_desc,schema_id  with rollup  order by schema_id ,type_desc  

image

按照分组字段从右到左的上卷汇总,最后汇总所有符合条件的数据到一个结果集。