環境篇:Atlas2.1.0兼容CDH6.3.2部署
- 2020 年 12 月 16 日
- 筆記
- ③環境篇, Atlas, Atlas2.1.0, cdh
環境篇:Atlas2.1.0兼容CDH6.3.2部署
Atlas 是什麼?
Atlas是一組可擴展和可擴展的核心基礎治理服務,使企業能夠有效地滿足Hadoop中的合規性要求,並允許與整個企業數據生態系統集成。
Apache Atlas為組織提供了開放的元數據管理和治理功能,以建立其數據資產的目錄,對這些資產進行分類和治理,並為數據科學家,分析師和數據治理團隊提供圍繞這些數據資產的協作功能。
如果沒有Atlas
大數據表依賴問題不好解決,元數據管理需要自行開發,如:hive血緣依賴圖
對於表依賴問題,沒有一個可以查詢的工具,不方便錯誤定位,即業務sql開發
- 官網://atlas.apache.org
- 表與表之間的血緣依賴
- 欄位與欄位之間的血緣依賴
1 Atlas 架構原理
2 Atlas 安裝及使用
安裝需要組件,HDFS、Yarn、Zookeeper、Kafka、Hbase、Solr、Hive,Python2.7環境
需要Maven3.5.0以上,jdk_151以上,python2.7。
2.1 下載源碼包2.0.0,IDEA打開
- 因與CDH集成,修改
pom
文件 - 在repositories標籤中增加CDH倉庫
<repository>
<id>cloudera</id>
<url>//repository.cloudera.com/artifactory/cloudera-repos</url>
<releases>
<enabled>true</enabled>
</releases>
<snapshots>
<enabled>false</enabled>
</snapshots>
</repository>
2.2 修改相關版本與CDH版本對應
<lucene-solr.version>7.4.0</lucene-solr.version>
<hadoop.version>3.0.0-cdh6.3.2</hadoop.version>
<hbase.version>2.1.0-cdh6.3.2</hbase.version>
<solr.version>7.4.0-cdh6.3.2</solr.version>
<hive.version>2.1.1-cdh6.3.2</hive.version>
<kafka.version>2.2.1-cdh6.3.2</kafka.version>
<kafka.scala.binary.version>2.11</kafka.scala.binary.version>
<zookeeper.version>3.4.5-cdh6.3.2</zookeeper.version>
<sqoop.version>1.4.7-cdh6.3.2</sqoop.version>
2.3 兼容Hive2.1.1
- 所需修改的項目位置:
apache-atlas-sources-2.1.0\addons\hive-bridge
①.org/apache/atlas/hive/bridge/HiveMetaStoreBridge.java 577行
String catalogName = hiveDB.getCatalogName() != null ? hiveDB.getCatalogName().toLowerCase() : null;
改為:
String catalogName = null;
②.org/apache/atlas/hive/hook/AtlasHiveHookContext.java 81行
this.metastoreHandler = (listenerEvent != null) ? metastoreEvent.getIHMSHandler() : null;
改為:C:\Users\Heaton\Desktop\apache-atlas-2.1.0-sources\apache-atlas-sources-2.1.0\addons
this.metastoreHandler = null;
2.3 編譯
mvn clean -DskipTests package -Pdist -X -T 8
- 編譯完成的文件在此目錄
apache-atlas-sources-2.1.0\distro\target
2.5 安裝
mkdir /usr/local/src/atlas
cd /usr/local/src/atlas
#複製apache-atlas-2.1.0-bin.tar.gz到安裝目錄
tar -zxvf apache-atlas-2.1.0-bin.tar.gz
cd apache-atlas-2.1.0/
2.6 修改配置文件
vim conf\atlas-application.properties
#集成修改hbase配置
atlas.graph.storage.hostname=cdh01.cm:2181,cdh02.cm:2181,cdh03.cm:2181
#集成修改solr配置
atlas.graph.index.search.solr.zookeeper-url=cdh01.cm:2181/solr,cdh02.cm:2181/solr,cdh03.cm:2181/solr
#集成修改kafka配置
atlas.notification.embedded=false #false外置的kafka
atlas.kafka.zookeeper.connect=cdh01.cm:2181,cdh02.cm:2181,cdh03.cm:2181
atlas.kafka.bootstrap.servers=cdh01.cm:9092,cdh02.cm:9092,cdh03.cm:9092
atlas.kafka.zookeeper.session.timeout.ms=60000
atlas.kafka.zookeeper.connection.timeout.ms=30000
atlas.kafka.enable.auto.commit=true
#集成修改其他配置
atlas.rest.address=//cdh01.cm:21000 #訪問地址埠,此值修改不生效,默認本地21000埠,此埠和impala衝突
atlas.server.run.setup.on.start=false #如果啟用並設置為true,則在伺服器啟動時將運行安裝步驟
atlas.audit.hbase.zookeeper.quorum=cdh01.cm:2181,cdh02.cm:2181,cdh03.cm:2181
#集成添加hive鉤子配置(文件最下面即可)
#在hive中做任何操作,都會被鉤子所感應到,並生成相應的事件發往atlas所訂閱的kafka-topic,再由atlas進行元數據生成和存儲管理
######### Hive Hook Configs #######
atlas.hook.hive.synchronous=false
atlas.hook.hive.numRetries=3
atlas.hook.hive.queueSize=10000
atlas.cluster.name=primary
#配置用戶名密碼(選做)
#開啟或關閉三種驗證方法
atlas.authentication.method.kerberos=true|false
atlas.authentication.method.ldap=true|false
atlas.authentication.method.file=true
#vim users-credentials.properties(修改該文件)
#>>>源文件
#username=group::sha256-password
admin=ADMIN::8c6976e5b5410415bde908bd4dee15dfb167a9c873fc4bb8a81f6f2ab448a918
rangertagsync=RANGER_TAG_SYNC::e3f67240f5117d1753c940dae9eea772d36ed5fe9bd9c94a300e40413f1afb9d
#<<<
#>>>修改成用戶名bigdata123,密碼bigdata123
#username=group::sha256-password
bigdata123=ADMIN::aa0336d976ba6db36f33f75a20f68dd9035b1e0e2315c331c95c2dc19b2aac13
rangertagsync=RANGER_TAG_SYNC::e3f67240f5117d1753c940dae9eea772d36ed5fe9bd9c94a300e40413f1afb9d
#<<<
#計算sha256:echo -n "bigdata123"|sha256sum
vim conf/atlas-env.sh
#集成添加hbase配置->下面的目錄為atlas下的hbase配置目錄,需要後面加入集群hbase配置
export HBASE_CONF_DIR=/usr/local/src/atlas/apache-atlas-2.1.0/conf/hbase/conf
#export HBASE_CONF_DIR=/etc/hbase/conf------------------------------
export MANAGE_LOCAL_HBASE=false (false外置的zk和hbase)
export MANAGE_LOCAL_SOLR=false (false外置的solr)
#修改記憶體指標(根據線上機器配置)
export ATLAS_SERVER_OPTS="-server -XX:SoftRefLRUPolicyMSPerMB=0
-XX:+CMSClassUnloadingEnabled -XX:+UseConcMarkSweepGC
-XX:+CMSParallelRemarkEnabled -XX:+PrintTenuringDistribution
-XX:+HeapDumpOnOutOfMemoryError
-XX:HeapDumpPath=dumps/atlas_server.hprof
-Xloggc:logs/gc-worker.log -verbose:gc
-XX:+UseGCLogFileRotation -XX:NumberOfGCLogFiles=10
-XX:GCLogFileSize=1m -XX:+PrintGCDetails -XX:+PrintHeapAtGC
-XX:+PrintGCTimeStamps"
#優化 JDK1.8(以下需要16G記憶體)
export ATLAS_SERVER_HEAP="-Xms15360m -Xmx15360m
-XX:MaxNewSize=5120m -XX:MetaspaceSize=100M
-XX:MaxMetaspaceSize=512m"
vim conf/atlas-log4j.xml
#去掉如下程式碼的注釋(開啟如下程式碼)
<appender name="perf_appender" class="org.apache.log4j.DailyRollingFileAppender">
<param name="file" value="${atlas.log.dir}/atlas_perf.log" />
<param name="datePattern" value="'.'yyyy-MM-dd" />
<param name="append" value="true" />
<layout class="org.apache.log4j.PatternLayout">
<param name="ConversionPattern" value="%d|%t|%m%n" />
</layout>
</appender>
<logger name="org.apache.atlas.perf" additivity="false">
<level value="debug" />
<appender-ref ref="perf_appender" />
</logger>
2.7 集成Hbase
- 添加hbase集群配置文件到apache-atlas-2.0.0/conf/hbase下(這裡連接的路徑需要和上面atlas-env.sh配置中一樣)
ln -s /etc/hbase/conf/ /usr/local/src/atlas/apache-atlas-2.1.0/conf/hbase/
2.8 集成Solr
- 將apache-atlas-2.1.0/conf/solr文件拷貝到solr所有節點的安裝目錄下,更名為
atlas-solr
scp -r /usr/local/src/atlas/apache-atlas-2.1.0/conf/solr [email protected]:/opt/cloudera/parcels/CDH-6.3.2-1.cdh6.3.2.p0.1605554/lib/solr/
scp -r /usr/local/src/atlas/apache-atlas-2.1.0/conf/solr [email protected]:/opt/cloudera/parcels/CDH-6.3.2-1.cdh6.3.2.p0.1605554/lib/solr/
scp -r /usr/local/src/atlas/apache-atlas-2.1.0/conf/solr [email protected]:/opt/cloudera/parcels/CDH-6.3.2-1.cdh6.3.2.p0.1605554/lib/solr/
#在solr節點
cd /opt/cloudera/parcels/CDH-6.3.2-1.cdh6.3.2.p0.1605554/lib/solr/
mv solr/ atlas-solr
#在任意solr節點修改solr對應的bash
vi /etc/passwd
/sbin/nologin 修改為 /bin/bash
#切換solr用戶執行
su solr
/opt/cloudera/parcels/CDH-6.3.2-1.cdh6.3.2.p0.1605554/lib/solr/bin/solr create -c vertex_index -d /opt/cloudera/parcels/CDH-6.3.2-1.cdh6.3.2.p0.1605554/lib/solr/atlas-solr -shards 3 -replicationFactor 2
/opt/cloudera/parcels/CDH-6.3.2-1.cdh6.3.2.p0.1605554/lib/solr/bin/solr create -c edge_index -d /opt/cloudera/parcels/CDH-6.3.2-1.cdh6.3.2.p0.1605554/lib/solr/atlas-solr -shards 3 -replicationFactor 2
/opt/cloudera/parcels/CDH-6.3.2-1.cdh6.3.2.p0.1605554/lib/solr/bin/solr create -c fulltext_index -d /opt/cloudera/parcels/CDH-6.3.2-1.cdh6.3.2.p0.1605554/lib/solr/atlas-solr -shards 3 -replicationFactor 2
#如果創建錯誤,可使用 /opt/cloudera/parcels/CDH/lib/solr/bin/solr delete -c ${collection_name} 刪除
#切換root用戶繼續配置其他
su root
- solr web控制台: //cdh01.cm:8983 驗證是否啟動成功
2.9 集成kafka
- 創建kafka-topic
kafka-topics --zookeeper cdh01.cm:2181,cdh02.cm:2181,cdh03.cm:2181 --create --replication-factor 3 --partitions 3 --topic _HOATLASOK
kafka-topics --zookeeper cdh01.cm:2181,cdh02.cm:2181,cdh03.cm:2181 --create --replication-factor 3 --partitions 3 --topic ATLAS_ENTITIES
kafka-topics --zookeeper cdh01.cm:2181,cdh02.cm:2181,cdh03.cm:2181 --create --replication-factor 3 --partitions 3 --topic ATLAS_HOOK
2.10 啟動測試
cd /usr/local/src/atlas/apache-atlas-2.1.0/
./bin/atlas_start.py
#停止:./bin/atlas_stop.py
-
默認用戶名和密碼為:admin
2.11 集成Hive
- 將 atlas-application.properties 配置文件,壓縮加入到 atlas-plugin-classloader-2.0.0.jar 中
#必須在此路徑打包,才能打到第一級目錄下
cd /usr/local/src/atlas/apache-atlas-2.1.0/conf
zip -u /usr/local/src/atlas/apache-atlas-2.1.0/hook/hive/atlas-plugin-classloader-2.1.0.jar atlas-application.properties
- 修改 hive-site.xml
<property>
<name>hive.exec.post.hooks</name>
<value>org.apache.atlas.hive.hook.HiveHook</value>
</property>
- 修改 hive-env.sh 的 Gateway 客戶端環境高級配置程式碼段(安全閥)
HIVE_AUX_JARS_PATH=/usr/local/src/atlas/apache-atlas-2.1.0/hook/hive
- 修改 HIVE_AUX_JARS_PATH
- 修改 hive-site.xml 的 HiveServer2 高級配置程式碼段(安全閥)
<property>
<name>hive.exec.post.hooks</name>
<value>org.apache.atlas.hive.hook.HiveHook</value>
</property>
<property>
<name>hive.reloadable.aux.jars.path</name>
<value>/usr/local/src/atlas/apache-atlas-2.1.0/hook/hive</value>
</property>
- 修改 HiveServer2 環境高級配置程式碼段
HIVE_AUX_JARS_PATH=/usr/local/src/atlas/apache-atlas-2.1.0/hook/hive
- 將配置好的Atlas包發往各個hive節點後重啟集群
scp -r /usr/local/src/atlas/apache-atlas-2.1.0 [email protected]:/usr/local/src/atlas/
scp -r /usr/local/src/atlas/apache-atlas-2.1.0 [email protected]:/usr/local/src/atlas/
更新配置重啟集群
- 將atlas配置文件copy到/etc/hive/conf下(集群各個節點)
scp /usr/local/src/atlas/apache-atlas-2.1.0/conf/atlas-application.properties [email protected]:/etc/hive/conf
scp /usr/local/src/atlas/apache-atlas-2.1.0/conf/atlas-application.properties [email protected]:/etc/hive/conf
scp /usr/local/src/atlas/apache-atlas-2.1.0/conf/atlas-application.properties [email protected]:/etc/hive/conf
2.12 再次啟動 Atlas
#啟動
./bin/atlas_start.py
#停止:./bin/atlas_stop.py
注意監控日誌,看是否報錯。主要日誌application.log
2.13 將 Hive 元數據導入 Atlas
- atlas節點添加hive環境變數
vim /etc/profile
#>>>
#hive
export HIVE_HOME=/opt/cloudera/parcels/CDH-6.3.2-1.cdh6.3.2.p0.1605554/lib/hive
export HIVE_CONF_DIR=/etc/hive/conf
export PATH=$HIVE_HOME/bin:$PATH
#<<<
source /etc/profile
- 執行atlas腳本
./bin/import-hive.sh
#輸入用戶名:admin;輸入密碼:admin(如修改請使用修改的)
體驗一下吧