Tez 優化參數

2019 年 12 月 15 日
筆記

背景

tez是hive的常用引擎之一，本文介紹tez常用的調試參數。主要是內存，map/reduce數量方面的調試。

1.內存調試

tez.am.resource.memory.mb

默認值	參數說明	詳細解釋
128	Application Master分配的container大小，單位為M

tez.am.launch.cmd-opts

默認值	參數說明	詳細解釋
-Dlog4j.configurationFile=tez-container-log4j2.properties -Dtez.container.log.level=INFO -Dtez.container.root.logger=CLA	Tez AppMaster進程啟動期間提供的命令行選項。不要在這些啟動選項中設置任何Xmx或Xms，以便Tez可以自動確定它們	不需要主動設置

hive.tez.container.size

默認值	參數說明	詳細解釋
128	Tez AppMaster向RM申請的container大小，單位M	不需要主動設置TEZ的AppMaster佔用的container大小由TEZ自動跳轉，但是向AM申請出來的container大小則需要本參數管理

hive.tez.java.opts

默認值	參數說明	詳細解釋
-Dlog4j.configurationFile=tez-container-log4j2.properties -Dtez.container.log.level=INFO -Dtez.container.root.logger=CLA	container進程啟動期間提供的命令行選項。可以在默認參數後續添加內存參數選項，比如：-Xmx7500m -Xms 7500m	該參數大小一般為hive.tez.container.size的80%，不建議直接在該參數中直接添加Xmx/Xms，而是使用下面的參數調參opts大小

tez.container.max.java.heap.fraction

默認值	參數說明	詳細解釋
0.8	如果hive.tez.java.ops參數中沒有設置Xmx/Xms指標的話，TEZ將選擇該參數來確定Xmx/Xms的值，值得大小為0.8*hive.tez.container.size	建議使用該值來調整opts

tez.runtime.io.sort.mb

默認值	參數說明	詳細解釋
512	排序輸出時的排序緩衝區大小,單位M	可以將tez.runtime.io.sort.mb設置為hive.tez.container.size的40％，但該值不能超過2GB。

hive.auto.convert.join.noconditionaltask.size

默認值	參數說明	詳細解釋
10000000	如果hive.auto.convert.join.noconditionaltask已關閉，則此參數不會生效。但是，如果它打開，並且n路連接的表/分區的n-1的大小總和小於此大小，連接直接轉換為mapjoin（沒有條件任務）。默認值為10MB	該值能將多個JOIN的表的n-1個表合成一個大表，然後將該錶轉為mapjoin\|可以將該值設置為hive.tez.container.size的1/3。

2.map/reduce優化

2.1 map數量設置

tez.grouping.min-size tez.grouping.max-size

默認值	參數說明	詳細解釋
50M,1G	分組拆分大小的下限，默認值為 50 MB分組拆分大小的上限，默認值為 1 GB	減小這兩個參數可以改善延遲，增大這兩個參數可以提高吞吐量。例如，若要為數據大小 128 MB設置四個映射器任務，可將每個任務的這兩個參數設置為 32 MB（33,554,432 位元組）。。

2.2 reduce數量設置

hive.tez.auto.reducer.parallelism

默認值	參數說明	詳細解釋
false	打開Tez的reducer parallelism特性。設置true後，tez會在運行時根據數據大小動態調整reduce數量	最好使用TEZ提供的動態調整reduce數量功能。不要使用mapred.reduce.tasks參數去直接決定reduce的個數。只有打開該參數才能使用下面的hive.tex.min.partition.factor ,hive.tez.max.partition.factor參數

hive.exec.reducers.max

默認值	參數說明	詳細解釋
1009	任務中允許的最大reduce數量	只有不使用mapred.reduce.tasks參數，該參數才能生效。

hive.exec.reducers.bytes.per.reducer

默認值	參數說明	詳細解釋
256000000	每個reduce處理的數據量，默認值是256M	介紹該參數是為了說明下面的計算reduce個數的公式

hive.tex.min.partition.factor hive.tez.max.partition.factor

maxReduces = min(hive.exec.reducers.max [1099], max((ReducerStage estimate/hive.exec.reducers.bytes.per.reducer),1)*hive.tez.max.partition.factor)

minReduces = min(hive.exec.reducers.max [1099], max((ReducerStage estimate/hive.exec.reducers.bytes.per.reducer),1)*hive.tez.min.partition.factor)

默認值	參數說明	詳細解釋
0.252	1.hive.tex.min.partition.factor默認值為0.252.hive.tez.max.partition.factor默認值為2這兩個值效果一致，增加該值就是增加reduce數量。減少該值則減少reduce數量	從公式中可以看出調整reduce數量由三個變量控制:hive.exec.reducers.bytes.per.reducer,hive.tex.min.partition.factor,hive.tex.max.partition.factor。假設reduce任務估算出的數據里為 190944 bytes，則maxReuces=min(1099, max(190944/256000000,1)*2)=2

tez.shuffle-vertex-manager.min-src-fraction tez.shuffle-vertex-manager.max-src-fraction

默認值	參數說明	詳細解釋
0.250.75	1.tez.shuffle-vertex-manager.min-src-fraction默認值為0.252.tez.shuffle-vertex-manager.max-src-fraction默認值為2這兩個值效果一致，增加該值則reduce stage啟動晚一些。減少該值則reduce stage啟動早一些	舉例：想讓所有map都執行完才開始執行reduce，可以將這兩個值都設置為1

Tez 優化參數

背景

1.內存調試

tez.am.resource.memory.mb

tez.am.launch.cmd-opts

hive.tez.container.size

hive.tez.java.opts

tez.container.max.java.heap.fraction

tez.runtime.io.sort.mb

hive.auto.convert.join.noconditionaltask.size

2.map/reduce優化

2.1 map數量設置

tez.grouping.min-size tez.grouping.max-size

2.2 reduce數量設置

hive.tez.auto.reducer.parallelism

hive.exec.reducers.max

hive.exec.reducers.bytes.per.reducer

hive.tex.min.partition.factor hive.tez.max.partition.factor

tez.shuffle-vertex-manager.min-src-fraction tez.shuffle-vertex-manager.max-src-fraction

VirMach 便宜 VPS

QNews

​Tez 優化參數

背景

1.內存調試

tez.am.resource.memory.mb

tez.am.launch.cmd-opts

hive.tez.container.size

hive.tez.java.opts

tez.container.max.java.heap.fraction

tez.runtime.io.sort.mb

hive.auto.convert.join.noconditionaltask.size

2.map/reduce優化

2.1 map數量設置

tez.grouping.min-size tez.grouping.max-size

2.2 reduce數量設置

hive.tez.auto.reducer.parallelism

hive.exec.reducers.max

hive.exec.reducers.bytes.per.reducer

hive.tex.min.partition.factor hive.tez.max.partition.factor

tez.shuffle-vertex-manager.min-src-fraction tez.shuffle-vertex-manager.max-src-fraction

分享此文：

Related Posts

作為文化學習模型的機器學習：一種意味着肥胖的算法教學（CS AI）

園子的進化：博客園x絲芙蘭，build更美的你

小米低價誤國？盧偉冰怒懟：發達國家也有沃爾瑪、宜家、優衣庫

多人已中毒！這樣的黑木耳千萬別吃

VirMach 便宜 VPS

QNews

熱門文章

熱門搜尋

Tez 優化參數