入門大數據—Hive計算引擎Tez簡介和使用
一、前言
Hive默認計算引擎時MR,為了提高計算速度,我們可以改為Tez引擎。至於為什麼提高了計算速度,可以參考下圖:
用Hive直接編寫MR程序,假設有四個有依賴關係的MR作業,上圖中,綠色是Reduce Task,雲狀表示寫屏蔽,需要將中間結果持久化寫到HDFS。
Tez可以將多個有依賴的作業轉換為一個作業,這樣只需寫一次HDFS,且中間節點較少,從而大大提升作業的計算性能。
二、安裝包準備
1)下載tez的依賴包://tez.apache.org
2)拷貝apache-tez-0.9.1-bin.tar.gz到hadoop102的/opt/module目錄
[root@hadoop102 module]$ ls
apache-tez-0.9.1-bin.tar.gz
3)解壓縮apache-tez-0.9.1-bin.tar.gz
[root@hadoop102 module]$ tar -zxvf apache-tez-0.9.1-bin.tar.gz
4)修改名稱
[root@hadoop102 module]$ mv apache-tez-0.9.1-bin/ tez-0.9.1
三、在Hive中配置Tez
1)進入到Hive的配置目錄:/opt/module/hive/conf
[root@hadoop102 conf]$ pwd
/opt/module/hive/conf
2)在hive-env.sh文件中添加tez環境變量配置和依賴包環境變量配置
[root@hadoop102 conf]$ vim hive-env.sh
添加如下配置
# Set HADOOP_HOME to point to a specific hadoop install directory
export HADOOP_HOME=/opt/module/hadoop-2.7.2
# Hive Configuration Directory can be controlled by:
export HIVE_CONF_DIR=/opt/module/hive/conf
# Folder containing extra libraries required for hive compilation/execution can be controlled by:
export TEZ_HOME=/opt/module/tez-0.9.1 #是你的tez的解壓目錄
export TEZ_JARS=""
for jar in `ls $TEZ_HOME |grep jar`; do
export TEZ_JARS=$TEZ_JARS:$TEZ_HOME/$jar
done
for jar in `ls $TEZ_HOME/lib`; do
export TEZ_JARS=$TEZ_JARS:$TEZ_HOME/lib/$jar
done
export HIVE_AUX_JARS_PATH=/opt/module/hadoop-2.7.2/share/hadoop/common/hadoop-lzo-0.4.20.jar$TEZ_JARS
3)在hive-site.xml文件中添加如下配置,更改hive計算引擎
<property>
<name>hive.execution.engine</name>
<value>tez</value>
</property>
四、配置Tez
1)在Hive的/opt/module/hive/conf下面創建一個tez-site.xml文件
[root@hadoop102 conf]$ pwd
/opt/module/hive/conf
[root@hadoop102 conf]$ vim tez-site.xml
添加如下內容
<?xml version="1.0" encoding="UTF-8"?>
<?xml-stylesheet type="text/xsl" href="configuration.xsl"?>
<configuration>
<property>
<name>tez.lib.uris</name> <value>${fs.defaultFS}/tez/tez-0.9.1,${fs.defaultFS}/tez/tez-0.9.1/lib</value>
</property>
<property>
<name>tez.lib.uris.classpath</name> <value>${fs.defaultFS}/tez/tez-0.9.1,${fs.defaultFS}/tez/tez-0.9.1/lib</value>
</property>
<property>
<name>tez.use.cluster.hadoop-libs</name>
<value>true</value>
</property>
<property>
<name>tez.history.logging.service.class</name> <value>org.apache.tez.dag.history.logging.ats.ATSHistoryLoggingService</value>
</property>
</configuration>
五、上傳Tez到集群
1)將/opt/module/tez-0.9.1上傳到HDFS的/tez路徑
[root@hadoop102 conf]$ hadoop fs -mkdir /tez
[root@hadoop102 conf]$ hadoop fs -put /opt/module/tez-0.9.1/ /tez
[root@hadoop102 conf]$ hadoop fs -ls /tez
/tez/tez-0.9.1
六、測試
1)啟動Hive
[root@hadoop102 hive]$ bin/hive
2)創建LZO表
hive (default)> create table student(
id int,
name string);
3)向表中插入數據
hive (default)> insert into student values(1,"zhangsan");
4)如果沒有報錯就表示成功了
hive (default)> select * from student;
1 zhangsan
七、小結
1)運行Tez時檢查到用過多內存而被NodeManager殺死進程問題:
Caused by: org.apache.tez.dag.api.SessionNotRunning: TezSession has already shutdown. Application application_1546781144082_0005 failed 2 times due to AM Container for appattempt_1546781144082_0005_000002 exited with exitCode: -103
For more detailed output, check application tracking page://hadoop103:8088/cluster/app/application_1546781144082_0005Then, click on links to logs of each attempt.
Diagnostics: Container [pid=11116,containerID=container_1546781144082_0005_02_000001] is running beyond virtual memory limits. Current usage: 216.3 MB of 1 GB physical memory used; 2.6 GB of 2.1 GB virtual memory used. Killing container.
這種問題是從機上運行的Container試圖使用過多的內存,而被NodeManager kill掉了。
[摘錄] The NodeManager is killing your container. It sounds like you are trying to use hadoop streaming which is running as a child process of the map-reduce task. The NodeManager monitors the entire process tree of the task and if it eats up more memory than the maximum set in mapreduce.map.memory.mb or mapreduce.reduce.memory.mb respectively, we would expect the Nodemanager to kill the task, otherwise your task is stealing memory belonging to other containers, which you don't want.
解決方法:
方案一:或者是關掉虛擬內存檢查。我們選這個,修改yarn-site.xml
<property>
<name>yarn.nodemanager.vmem-check-enabled</name>
<value>false</value>
</property>
方案二:mapred-site.xml中設置Map和Reduce任務的內存配置如下:(value中實際配置的內存需要根據自己機器內存大小及應用情況進行修改)
<property>
<name>mapreduce.map.memory.mb</name>
<value>1536</value>
</property>
<property>
<name>mapreduce.map.java.opts</name>
<value>-Xmx1024M</value>
</property>
<property>
<name>mapreduce.reduce.memory.mb</name>
<value>3072</value>
</property>
<property>
<name>mapreduce.reduce.java.opts</name>
<value>-Xmx2560M</value>
</property>