藍鯨單機離線部署:app_mgr組件安裝失敗解決
- 2020 年 3 月 8 日
- 筆記
之前在騰訊藍鯨智雲-單機離線部署測試中,遇到了幾個安裝問題,本文記錄下3.2 app_mgr組件安裝失敗 的解決過程,因為這個問題卡了很久(可能也是因為筆者對python相關知識和藍鯨產品不夠熟悉),雖然最終解決了,但過程本身更值得記錄。
1.問題描述
離線安裝app_mgr組件時失敗: 安裝命令:./bk_install app_mgr 報錯信息如下:
create virtualenv for paas_agent Requirement already satisfied: pbr in /usr/local/lib/python2.7/site-packages Requirement already satisfied: virtualenvwrapper in /usr/local/lib/python2.7/site-packages Requirement already satisfied: virtualenv-clone in /usr/local/lib/python2.7/site-packages (from virtualenvwrapper) Requirement already satisfied: stevedore in /usr/local/lib/python2.7/site-packages (from virtualenvwrapper) Requirement already satisfied: virtualenv in /usr/local/lib/python2.7/site-packages (from virtualenvwrapper) Requirement already satisfied: pbr>=1.6 in /usr/local/lib/python2.7/site-packages (from stevedore->virtualenvwrapper) Requirement already satisfied: six>=1.9.0 in /usr/local/lib/python2.7/site-packages (from stevedore->virtualenvwrapper) [192.168.1.6]20200303-174651 224 mkvirtualenv -a /data/bkce/paas_agent/paas_agent --extra-search-dir=/data/install/pip --no-download -p /usr/local/bin/python paas_agent Already using interpreter /usr/local/bin/python New python executable in /data/bkce/.envs/paas_agent/bin/python Installing setuptools, pip, wheel...done. Setting project for paas_agent to /data/bkce/paas_agent/paas_agent Ignoring indexes: http://mirrors.cloud.tencent.com/pypi/simple Requirement already satisfied (use --upgrade to upgrade): pbr in /data/bkce/.envs/paas_agent/lib/python2.7/site-packages Ignoring indexes: http://mirrors.cloud.tencent.com/pypi/simple Requirement already satisfied (use --upgrade to upgrade): virtualenvwrapper in /data/bkce/.envs/paas_agent/lib/python2.7/site-packages Requirement already satisfied (use --upgrade to upgrade): virtualenv-clone in /data/bkce/.envs/paas_agent/lib/python2.7/site-packages (from virtualenvwrapper) Requirement already satisfied (use --upgrade to upgrade): stevedore in /data/bkce/.envs/paas_agent/lib/python2.7/site-packages (from virtualenvwrapper) Requirement already satisfied (use --upgrade to upgrade): virtualenv in /data/bkce/.envs/paas_agent/lib/python2.7/site-packages (from virtualenvwrapper) Requirement already satisfied (use --upgrade to upgrade): pbr>=1.6 in /data/bkce/.envs/paas_agent/lib/python2.7/site-packages (from stevedore->virtualenvwrapper) Requirement already satisfied (use --upgrade to upgrade): six>=1.9.0 in /data/bkce/.envs/paas_agent/lib/python2.7/site-packages (from stevedore->virtualenvwrapper) Ignoring indexes: http://mirrors.cloud.tencent.com/pypi/simple Requirement already satisfied (use --upgrade to upgrade): supervisor in /data/bkce/.envs/paas_agent/lib/python2.7/site-packages Requirement already satisfied (use --upgrade to upgrade): six in /data/bkce/.envs/paas_agent/lib/python2.7/site-packages Requirement already satisfied (use --upgrade to upgrade): meld3>=0.6.5 in /data/bkce/.envs/paas_agent/lib/python2.7/site-packages (from supervisor) [192.168.1.6]20200303-174801 233 generate env variable settings. [192.168.1.6]20200303-174801 151 exec: pip install --no-cache-dir -r requirements.txt (/data/bkce/paas_agent/paas_agent) Collecting Django==1.8.11 (from -r requirements.txt (line 1)) Retrying (Retry(total=4, connect=None, read=None, redirect=None)) after connection broken by 'NewConnectionError('<pip._vendor.requests.packages.urllib3.connection.HTTPConnection object at 0x7f7b58e91150>: Failed to establish a new connection: [Errno 101] Network is unreachable',)': /pypi/simple/django/ Retrying (Retry(total=3, connect=None, read=None, redirect=None)) after connection broken by 'NewConnectionError('<pip._vendor.requests.packages.urllib3.connection.HTTPConnection object at 0x7f7b58e91d50>: Failed to establish a new connection: [Errno 101] Network is unreachable',)': /pypi/simple/django/ Retrying (Retry(total=2, connect=None, read=None, redirect=None)) after connection broken by 'NewConnectionError('<pip._vendor.requests.packages.urllib3.connection.HTTPConnection object at 0x7f7b58e91f10>: Failed to establish a new connection: [Errno 101] Network is unreachable',)': /pypi/simple/django/ Retrying (Retry(total=1, connect=None, read=None, redirect=None)) after connection broken by 'NewConnectionError('<pip._vendor.requests.packages.urllib3.connection.HTTPConnection object at 0x7f7b58e5c110>: Failed to establish a new connection: [Errno 101] Network is unreachable',)': /pypi/simple/django/ Retrying (Retry(total=0, connect=None, read=None, redirect=None)) after connection broken by 'NewConnectionError('<pip._vendor.requests.packages.urllib3.connection.HTTPConnection object at 0x7f7b58e5c2d0>: Failed to establish a new connection: [Errno 101] Network is unreachable',)': /pypi/simple/django/ Could not find a version that satisfies the requirement Django==1.8.11 (from -r requirements.txt (line 1)) (from versions: ) No matching distribution found for Django==1.8.11 (from -r requirements.txt (line 1)) [192.168.1.6]20200303-174900 177 pip install (--no-cache-dir ) for paas_agent. FAILED [192.168.1.6]20200303-174900 47 Abort
注意:離線安裝就是指安裝環境無法連接互聯網,如果你的部署環境允許可以連接外網,測試過該組件安裝會非常順利。
2.初步分析
首先,比較奇怪的是只有離線安裝app_mgr這個組件時,報錯無法連接網絡,回顧上面的報錯日誌,發現安裝這個組件時:
[192.168.1.6]20200303-174801 233 generate env variable settings. [192.168.1.6]20200303-174801 151 exec: pip install --no-cache-dir -r requirements.txt (/data/bkce/paas_agent/paas_agent)
看起來這個pip 命令沒有使用--find-links
參數指定本地的路徑,所以嘗試連接外網的pip源。 而在其他組件安裝時,都是有指定這個參數到各自本地路徑的:
--比如安裝fta: [192.168.1.6]20200302-001610 233 generate env variable settings. [192.168.1.6]20200302-001610 151 exec: pip install --no-cache-dir --no-index --find-links=/data/src/fta/support-files/pkgs -r requirements.txt (/data/bkce/fta/fta) --比如安裝bkdata [192.168.1.6]20200302-003237 233 generate env variable settings. [192.168.1.6]20200302-003237 151 exec: pip install --no-cache-dir --no-index --find-links=/data/src/bkdata/support-files/pkgs -r requirements.txt (/data/bkce/bkdata/dataapi)
可以看到這類組件安裝在同樣類似的步驟時,都有使用--find-links
參數各自指定本地包存放的路徑。
初步進行了一些嘗試:
2.1 直接使用pip離線安裝後再次嘗試單獨安裝app_mgr
pip install --no-cache-dir --no-index --find-links=/data/src/paas_agent/support-files/pkgs -r /data/bkce/paas_agent/paas_agent/requirements.txt
pip離線安裝成功,但是再調用安裝./bk_install app_mgr 組件依然報錯,說明手工提前安裝無效。 這大概是因為程序是進入到對應的virtualenv執行的,而虛擬環境相對是獨立的。
2.2 找到一些pip.conf的配置文件,備份原文件,修改配置指定本地路徑 嘗試修過的配置文件:/data/src/.pip/pip.conf、/data/install/pip/pip.conf,內容改為:
[global] find-links = /data/src/paas_agent/support-files/pkgs [install] find-links = /data/src/paas_agent/support-files/pkgs
但是調用安裝./bk_install app_mgr 組件依然報同樣錯誤,說明無效。 後面其他嘗試會發現有更多的pip.conf,全部修改也是不行。
2.3 設置環境變量 官方文檔搜到一個環境變量PIP_FIND_LINKS:
export PIP_FIND_LINKS=/data/src/paas_agent/support-files/pkgs
再次嘗試調用./bk_install app_mgr
安裝組件,報錯不變。 這大概是因為寫死在程序里的,類似crontab定時任務一樣,在外部設置變量干預也沒用,必須找到裏面的設置。
2.4 其他嘗試 比如在bk_install中app_mgr模塊下手工加入上面的環境變量設置,也不行,報錯不變。
3.集思廣益
問題有些陷入僵局,而且顯然是有問題,與客戶反饋上述分析,一致認為很可能是bug,找藍鯨客服進行反饋。 客服人員的答覆是離線安裝建議配置完整的本地pip源,考慮到全量pip源要接近2T的空間申請,轉換為進行指定包的pip源搭建。 而且這個解決方案更像是workaround,跳過了問題本質,因為實際其他組件都不需要,會使用find-links參數指定本地的包目錄。
因為之前沒接觸過,配置本地pip源也耗費了不少時間搜索驗證:
[root@rbtnode1 bin]# find /data -name pip.conf /data/install/pip/pip.conf /data/install/pip.conf /data/src/service/.pip/pip.conf /data/src/.pip/pip.conf /data/src/pip.conf cat /data/install/pip/pip.conf cat /data/install/pip.conf cat /data/src/service/.pip/pip.conf cat /data/src/.pip/pip.conf cat /data/src/pip.conf cat ~/.pip/pip.conf
不清楚究竟會用到哪個pip.conf,所以所有配置文件備份,然後內容統一都改為本地pip源:
[global] trusted-host = 192.168.1.6 index-url = http://192.168.1.6:8080/simple
關於本地pip源的具體配置,可參考網上這兩篇文章:
但是嘗試安裝還是報錯。修改globals.env配置文件:
# 設置訪問網絡資源如yum源所使用的HTTP代理地址, 如: BK_PROXY=http://192.168.0.1:8833 export BK_PROXY=http://192.168.1.6:8080/simple
和同事也聊到這個事情,從邏輯上來看還是應該解決如何跟其他組件一樣可以指定find-links參數才可以。 思路只能是自己從腳本源頭去找,看有沒有對應的設置。從bk_install這個主腳本開始為入口。
4.最終解決
開始看腳本沒多久就看下去了,因為自己很少運用腳本能力,本身也是弱項。從bk_install到bkcec就看到裏面調用了好多文件,一時找不到頭緒。此時又回頭看最初的報錯日誌,看報錯之前有這樣一行,像是腳本的輸出內容:
[192.168.1.6]20200303-174801 233 generate env variable settings. [192.168.1.6]20200303-174801 151 exec: pip install --no-cache-dir -r requirements.txt (/data/bkce/paas_agent/paas_agent)
依據"generate env variable settings"搜索/data/install下所有的文件,發現只有utils.fc文件包含:
[root@rbtnode1 install]# grep "generate env variable settings" * grep: agent_setup: Is a directory grep: appmgr: Is a directory grep: bcs: Is a directory grep: bin: Is a directory grep: build: Is a directory grep: deck: Is a directory grep: extra: Is a directory grep: health_check: Is a directory grep: migrate: Is a directory grep: pip: Is a directory grep: scripts: Is a directory grep: setuptools-36.0.1: Is a directory grep: support-files: Is a directory grep: templates: Is a directory grep: uninstall: Is a directory utils.fc: log "generate env variable settings." grep: verify: Is a directory [root@rbtnode1 install]# ls -l utils.fc -rw-r--r-- 1 root root 38897 Jan 9 16:11 utils.fc [root@rbtnode1 install]# scp utils.fc 192.168.1.61:/tmp/
拷貝下來去看發現有這樣一段代碼比較像:
_install_pypkgs () { local module=$1 local project=$2 local local_pip_src=$PKG_SRC_PATH/$module/support-files/pkgs local pip_options="--no-cache-dir " local _ordered_requirement_files=( $( shopt -s nullglob; echo 0[0-9]_requirements*.txt) ) if [ "${#_ordered_requirement_files[@]}" -eq 0 ]; then _ordered_requirement_files=( requirements.txt ) fi for reqr_file in ${_ordered_requirement_files[@]}; do if [ "${reqr_file//_local/}" != "$reqr_file" -o -f SELF_CONTAINED_PIP_PKG ]; then pip_options="--no-cache-dir --no-index --find-links=$local_pip_src" fi log "exec: pip install $pip_options -r $reqr_file ($PWD)" http_proxy=$BK_PROXY https_proxy=$BK_PROXY pip install $pip_options -r $reqr_file <-- 這裡pip install 帶的參數$pip_options很可能沒有find-links參數 nassert "pip install ($pip_options) for $venv_name" done #shopt -s nullglob }
上面標註的那一行,指出這裡pip install 帶的參數$pip_options很可能沒有find-links參數,因為上面賦予pip_options變量的是在if條件裏面,暫時來不及整體梳理分析,嘗試直接修改 utils.fc 文件加入pip_options的定義:
_install_pypkgs () { local module=$1 local project=$2 local local_pip_src=$PKG_SRC_PATH/$module/support-files/pkgs local pip_options="--no-cache-dir " local _ordered_requirement_files=( $( shopt -s nullglob; echo 0[0-9]_requirements*.txt) ) if [ "${#_ordered_requirement_files[@]}" -eq 0 ]; then _ordered_requirement_files=( requirements.txt ) fi for reqr_file in ${_ordered_requirement_files[@]}; do if [ "${reqr_file//_local/}" != "$reqr_file" -o -f SELF_CONTAINED_PIP_PKG ]; then pip_options="--no-cache-dir --no-index --find-links=$local_pip_src" fi log "exec: pip install $pip_options -r $reqr_file ($PWD)" http_proxy=$BK_PROXY https_proxy=$BK_PROXY #pip install $pip_options -r $reqr_file <-- 之前的這一行注釋,下面兩行是新增,指定pip_options參數值後再調用pip install pip_options="--no-cache-dir --no-index --find-links=$local_pip_src" pip install $pip_options -r $reqr_file nassert "pip install ($pip_options) for $venv_name" done #shopt -s nullglob }
修改 utils.fc 後再次測試,發現之前報錯的位置不再報錯(雖然顯示還沒有find-links參數,但實際已經有了):
[192.168.1.6]20200303-214725 235 generate env variable settings. [192.168.1.6]20200303-214726 151 exec: pip install --no-cache-dir -r requirements.txt (/data/bkce/paas_agent/paas_agent) Ignoring indexes: http://192.168.1.6:8080/simple Collecting Django==1.8.11 (from -r requirements.txt (line 1)) Collecting PyMySQL==0.6.7 (from -r requirements.txt (line 2)) 省略部分輸出.. Collecting idna<2.9,>=2.5 (from requests==2.21.0->-r requirements.txt (line 3)) Could not find a version that satisfies the requirement idna<2.9,>=2.5 (from requests==2.21.0->-r requirements.txt (line 3)) (from versions: ) No matching distribution found for idna<2.9,>=2.5 (from requests==2.21.0->-r requirements.txt (line 3)) [192.168.1.6]20200303-214856 177 pip install (--no-cache-dir --no-index --find-links=/data/src/paas_agent/support-files/pkgs) for paas_agent. FAILED [192.168.1.6]20200303-214856 47 Abort [root@rbtnode1 install]#
但最後又因為缺包中止了安裝。 這個 idna<2.9,>=2.5 在paas_agent的requirements.txt中實際沒有列出來,但實際需要。可以將其他位置的包都統一打包到一個目錄(/data/localpip),然後拷貝其他的包到這個目錄下:
[root@rbtnode1 pkgs]# pwd /data/src/paas_agent/support-files/pkgs [root@rbtnode1 pkgs]# ls -l |wc -l 62 [root@rbtnode1 pkgs]# cp -n /data/localpip/* ./ [root@rbtnode1 pkgs]# pwd /data/src/paas_agent/support-files/pkgs [root@rbtnode1 pkgs]# ls -l |wc -l 281
然後再嘗試安裝app_mgr:
[root@rbtnode1 pkgs]# cd /data/install/ [root@rbtnode1 install]# ./bk_install app_mgr
這次終於成功了,日誌如下,可以看到appt安裝成功後接下來還是安裝appo,都可以成功:
Collecting chardet<3.1.0,>=3.0.2 (from requests==2.21.0->-r requirements.txt (line 3)) Collecting idna<2.9,>=2.5 (from requests==2.21.0->-r requirements.txt (line 3)) Collecting certifi>=2017.4.17 (from requests==2.21.0->-r requirements.txt (line 3)) Installing collected packages: Django, PyMySQL, urllib3, chardet, idna, certifi, requests, pytz, amqp, anyjson, kombu, billiard, celery, django-celery, redis, httplib2, xlrd, xlwt, MarkupSafe, Mako, Jinja2, pycrypto, gunicorn, six, SQLAlchemy, suds, supervisor, uWSGI, pytest-runner, setuptools-scm Running setup.py install for anyjson: started Running setup.py install for anyjson: finished with status 'done' Running setup.py install for billiard: started Running setup.py install for billiard: finished with status 'done' 省略部分輸出.. Successfully installed Django-1.8.11 Jinja2-2.8 Mako-1.0.4 MarkupSafe-0.23 PyMySQL-0.6.7 SQLAlchemy-1.0.12 amqp-1.4.9 anyjson-0.3.3 billiard-3.3.0.23 celery-3.1.18 certifi-2019.3.9 chardet-3.0.4 django-celery-3.2.1 gunicorn-19.6.0 httplib2-0.9.1 idna-2.8 kombu-3.0.35 pycrypto-2.6.1 pytest-runner-2.8 pytz-2016.6.1 redis-2.10.5 requests-2.21.0 setuptools-scm-1.11.1 six-1.10.0 suds-0.4 supervisor-3.3.1 uWSGI-2.0.13.1 urllib3-1.24.1 xlrd-1.0.0 xlwt-1.1.2 [192.168.1.6]20200303-222848 175 pip install (--no-cache-dir --no-index --find-links=/data/src/paas_agent/support-files/pkgs) for paas_agent. OK [192.168.1.6]20200303-222858 453 apps isolate mode: virutalenv Ignoring indexes: http://192.168.1.6:8080/simple Requirement already satisfied (use --upgrade to upgrade): Django==1.8.11 in /data/bkce/.envs/paas_agent/lib/python2.7/site-packages (from -r requirements.txt (line 1)) Requirement already satisfied (use --upgrade to upgrade): PyMySQL==0.6.7 in /data/bkce/.envs/paas_agent/lib/python2.7/site-packages (from -r requirements.txt (line 2)) 省略部分輸出.. [192.168.1.6]20200303-222926 151 install python package for virtualenv paas_agent done. [192.168.1.6]20200303-222927 468 local nginx is required for paas_agent. going to install it. Loaded plugins: fastestmirror Loading mirror speeds from cached hostfile Package 1:nginx-1.12.2-2.el7.x86_64 already installed and latest version Nothing to do [192.168.1.6]20200303-222934 175 render: #etc#nginx.conf -> /data/bkce//etc/nginx.conf. OK [192.168.1.6]20200303-222935 175 render: #etc#nginx#paasagent.conf -> /data/bkce//etc/nginx/paasagent.conf. OK [192.168.1.6]20200303-222936 322 PLACE HOLDER __SID__ is replaced into empty [192.168.1.6]20200303-222937 322 PLACE HOLDER __TOKEN__ is replaced into empty [192.168.1.6]20200303-222937 175 render: #etc#paas_agent_config.yaml.tpl -> /data/bkce//etc/paas_agent_config.yaml. OK [192.168.1.6]20200303-222938 175 render: #etc#supervisor-paas_agent.conf -> /data/bkce//etc/supervisor-paas_agent.conf. OK [192.168.1.6]20200303-222939 56 install appt(allproject) done initdata for appt() [192.168.1.6]20200303-222946 182 exec initdata_appt on 192.168.1.6 [192.168.1.6]20200303-222958 262 update config file: paas_agent_config.yaml [192.168.1.6]20200303-222958 268 register appt succeded. [192.168.1.6]20200303-222958 502 create database bksuite_common [192.168.1.6]20200303-222958 504 add version info to db [192.168.1.6]20200303-223001 98 starting appt(ALL) on host: 192.168.1.6 [192.168.1.6]20200303-223052 77 activate appt(192.168.1.6) succeded #這裡appt已經安裝成功,接下來安裝appo 省略部分輸出.. install appo(all) [192.168.1.6]20200303-223102 112 check dependences for paas_agent 省略部分輸出.. initdata for appo() [192.168.1.6]20200303-223509 182 exec initdata_appo on 192.168.1.6 [192.168.1.6]20200303-223533 262 update config file: paas_agent_config.yaml [192.168.1.6]20200303-223534 268 register appo succeded. [192.168.1.6]20200303-223535 502 create database bksuite_common [192.168.1.6]20200303-223535 504 add version info to db [192.168.1.6]20200303-223541 98 starting appo(ALL) on host: 192.168.1.6 [192.168.1.6]20200303-223613 77 activate appo(192.168.1.6) succeded [192.168.1.6] paas_agent() paas_agent RUNNING pid 23792, uptime 0:06:10 [192.168.1.6] nginx: RUNNING [192.168.1.6] paas_agent() paas_agent RUNNING pid 23792, uptime 0:06:42 [192.168.1.6] nginx: RUNNING [192.168.1.6] rabbitmq: RUNNING 如果以上步驟沒有報錯, 你現在可以完成正式環境及測試環境的部署,可以: 1. 通過./bk_install saas-o bk_nodeman 部署節點管理app, 或 2. 通過開發者中心部署app. 若要安裝藍鯨監控, 日誌檢索, 需要先通過 ./bk_install bkdata 安裝 bkdata [root@rbtnode1 install]#
終於跌跌撞撞的解決了這個困惑許久的問題。後續自己還需要加強python和shell的腳本能力。