Tez 优化参数

2019 年 12 月 15 日
笔记

背景

tez是hive的常用引擎之一，本文介绍tez常用的调试参数。主要是内存，map/reduce数量方面的调试。

1.内存调试

tez.am.resource.memory.mb

默认值	参数说明	详细解释
128	Application Master分配的container大小，单位为M

tez.am.launch.cmd-opts

默认值	参数说明	详细解释
-Dlog4j.configurationFile=tez-container-log4j2.properties -Dtez.container.log.level=INFO -Dtez.container.root.logger=CLA	Tez AppMaster进程启动期间提供的命令行选项。不要在这些启动选项中设置任何Xmx或Xms，以便Tez可以自动确定它们	不需要主动设置

hive.tez.container.size

默认值	参数说明	详细解释
128	Tez AppMaster向RM申请的container大小，单位M	不需要主动设置TEZ的AppMaster占用的container大小由TEZ自动跳转，但是向AM申请出来的container大小则需要本参数管理

hive.tez.java.opts

默认值	参数说明	详细解释
-Dlog4j.configurationFile=tez-container-log4j2.properties -Dtez.container.log.level=INFO -Dtez.container.root.logger=CLA	container进程启动期间提供的命令行选项。可以在默认参数后续添加内存参数选项，比如：-Xmx7500m -Xms 7500m	该参数大小一般为hive.tez.container.size的80%，不建议直接在该参数中直接添加Xmx/Xms，而是使用下面的参数调参opts大小

tez.container.max.java.heap.fraction

默认值	参数说明	详细解释
0.8	如果hive.tez.java.ops参数中没有设置Xmx/Xms指标的话，TEZ将选择该参数来确定Xmx/Xms的值，值得大小为0.8*hive.tez.container.size	建议使用该值来调整opts

tez.runtime.io.sort.mb

默认值	参数说明	详细解释
512	排序输出时的排序缓冲区大小,单位M	可以将tez.runtime.io.sort.mb设置为hive.tez.container.size的40％，但该值不能超过2GB。

hive.auto.convert.join.noconditionaltask.size

默认值	参数说明	详细解释
10000000	如果hive.auto.convert.join.noconditionaltask已关闭，则此参数不会生效。但是，如果它打开，并且n路连接的表/分区的n-1的大小总和小于此大小，连接直接转换为mapjoin（没有条件任务）。默认值为10MB	该值能将多个JOIN的表的n-1个表合成一个大表，然后将该表转为mapjoin\|可以将该值设置为hive.tez.container.size的1/3。

2.map/reduce优化

2.1 map数量设置

tez.grouping.min-size tez.grouping.max-size

默认值	参数说明	详细解释
50M,1G	分组拆分大小的下限，默认值为 50 MB分组拆分大小的上限，默认值为 1 GB	减小这两个参数可以改善延迟，增大这两个参数可以提高吞吐量。例如，若要为数据大小 128 MB设置四个映射器任务，可将每个任务的这两个参数设置为 32 MB（33,554,432 字节）。。

2.2 reduce数量设置

hive.tez.auto.reducer.parallelism

默认值	参数说明	详细解释
false	打开Tez的reducer parallelism特性。设置true后，tez会在运行时根据数据大小动态调整reduce数量	最好使用TEZ提供的动态调整reduce数量功能。不要使用mapred.reduce.tasks参数去直接决定reduce的个数。只有打开该参数才能使用下面的hive.tex.min.partition.factor ,hive.tez.max.partition.factor参数

hive.exec.reducers.max

默认值	参数说明	详细解释
1009	任务中允许的最大reduce数量	只有不使用mapred.reduce.tasks参数，该参数才能生效。

hive.exec.reducers.bytes.per.reducer

默认值	参数说明	详细解释
256000000	每个reduce处理的数据量，默认值是256M	介绍该参数是为了说明下面的计算reduce个数的公式

hive.tex.min.partition.factor hive.tez.max.partition.factor

maxReduces = min(hive.exec.reducers.max [1099], max((ReducerStage estimate/hive.exec.reducers.bytes.per.reducer),1)*hive.tez.max.partition.factor)

minReduces = min(hive.exec.reducers.max [1099], max((ReducerStage estimate/hive.exec.reducers.bytes.per.reducer),1)*hive.tez.min.partition.factor)

默认值	参数说明	详细解释
0.252	1.hive.tex.min.partition.factor默认值为0.252.hive.tez.max.partition.factor默认值为2这两个值效果一致，增加该值就是增加reduce数量。减少该值则减少reduce数量	从公式中可以看出调整reduce数量由三个变量控制:hive.exec.reducers.bytes.per.reducer,hive.tex.min.partition.factor,hive.tex.max.partition.factor。假设reduce任务估算出的数据里为 190944 bytes，则maxReuces=min(1099, max(190944/256000000,1)*2)=2

tez.shuffle-vertex-manager.min-src-fraction tez.shuffle-vertex-manager.max-src-fraction

默认值	参数说明	详细解释
0.250.75	1.tez.shuffle-vertex-manager.min-src-fraction默认值为0.252.tez.shuffle-vertex-manager.max-src-fraction默认值为2这两个值效果一致，增加该值则reduce stage启动晚一些。减少该值则reduce stage启动早一些	举例：想让所有map都执行完才开始执行reduce，可以将这两个值都设置为1

Tez 优化参数

背景

1.内存调试

tez.am.resource.memory.mb

tez.am.launch.cmd-opts

hive.tez.container.size

hive.tez.java.opts

tez.container.max.java.heap.fraction

tez.runtime.io.sort.mb

hive.auto.convert.join.noconditionaltask.size

2.map/reduce优化

2.1 map数量设置

tez.grouping.min-size tez.grouping.max-size

2.2 reduce数量设置

hive.tez.auto.reducer.parallelism

hive.exec.reducers.max

hive.exec.reducers.bytes.per.reducer

hive.tex.min.partition.factor hive.tez.max.partition.factor

tez.shuffle-vertex-manager.min-src-fraction tez.shuffle-vertex-manager.max-src-fraction

VirMach 便宜 VPS

QNews

​Tez 优化参数

背景

1.内存调试

tez.am.resource.memory.mb

tez.am.launch.cmd-opts

hive.tez.container.size

hive.tez.java.opts

tez.container.max.java.heap.fraction

tez.runtime.io.sort.mb

hive.auto.convert.join.noconditionaltask.size

2.map/reduce优化

2.1 map数量设置

tez.grouping.min-size tez.grouping.max-size

2.2 reduce数量设置

hive.tez.auto.reducer.parallelism

hive.exec.reducers.max

hive.exec.reducers.bytes.per.reducer

hive.tex.min.partition.factor hive.tez.max.partition.factor

tez.shuffle-vertex-manager.min-src-fraction tez.shuffle-vertex-manager.max-src-fraction

分享此文：

Related Posts

技术分享 | 从库 MTS 多线程并行回放（一）

如何选择适合自己网站的防盗链

小米低价误国？卢伟冰怒怼：发达国家也有沃尔玛、宜家、优衣库

多人已中毒！这样的黑木耳千万别吃

VirMach 便宜 VPS

QNews

热门文章

热门搜寻

Tez 优化参数