Hive 入门 Group By 全案例【附代码】
- 2019 年 12 月 25 日
- 笔记
不明就里的读者可以看上一篇:
昨晚发文之后,有读者陆陆续续在星球发问了,脚本到底该怎么写?
当然也有星友在第一时间拿出了自己的方案,工工整整,让我好生钦佩。
不废话了,下面是大家想看的具体实现。
环境:
Hive: 2.7.7 Oracle SQL Developer Cloudera JDBC Driver
案例 – 1 : Group by 的常规化应用
select schema_id , type_desc , count(object_id) as object_count from tblobj2 group by schema_id,type_desc
结果:

image
案例 – 2 : Group by 之 Grouping Sets 应用
select schema_id , type_desc , count(object_id) as object_count from tblobj2 group by schema_id,type_desc grouping sets((schema_id,type_desc),schema_id)
结果:

image
select schema_id , type_desc , count(object_id) as object_count from tblobj2 group by schema_id,type_desc grouping sets((schema_id,type_desc),type_desc)
结果:

image
select schema_id , type_desc , count(object_id) as object_count from tblobj2 group by schema_id,type_desc grouping sets((schema_id,type_desc),type_desc,()) order by schema_id ,type_desc
结果:

image
select schema_id , type_desc , count(object_id) as object_count from tblobj2 group by schema_id,type_desc grouping sets(schema_id,type_desc,()) order by schema_id ,type_desc
结果:

image
结论:
grouping sets 的作用就是将选定的分组字段,再分子组进行汇总。
(schema_id,type_desc) 用来指定细分字段组合;
单个字段,比如 schema_id, type_desc 用来指定细分的单个字段;
()用来计算总和,总计等,目标对象是符合条件的所有数据,即相当于没有使用字段做 group by 的聚合计算。
最终将这些 grouping sets 里面指定的细分字段聚合得到的结果联合在一个结果集而展现出来。
案例 – 3 : Group by 之 with cube
select schema_id , type_desc , count(object_id) as object_count from tblobj2 group by schema_id,type_desc with cube order by schema_id ,type_desc
结果 :

image
相当于是以下 grouping sets 的简化版本
select schema_id , type_desc , count(object_id) as object_count from tblobj2 group by schema_id,type_desc grouping sets((schema_id,type_desc),schema_id,type_desc,()) order by schema_id ,type_desc
案例 – 4 : Group by 之 with rollup
这是一个上卷的操作,唯一一个有方向性的分组聚合操作
select schema_id , type_desc , count(object_id) as object_count from tblobj2 group by schema_id,type_desc with rollup order by schema_id ,type_desc

image
select schema_id , type_desc , count(object_id) as object_count from tblobj2 group by type_desc,schema_id with rollup order by schema_id ,type_desc

image
按照分组字段从右到左的上卷汇总,最后汇总所有符合条件的数据到一个结果集。