TensorFlow-Serving的使用实战案例笔记(tf=1.4)
- 2020 年 3 月 27 日
- 筆記
最近在测试一些通用模型+项目,包括:CLUE(tf+pytorch),bert4keras(keras), Kashgari(keras+tf)等。其中如果要部署的话,就有tensorflow-serving和flask的选择了。 这里刚好有一个非常好的实战例子,基于tensorflow 1.x的,比较全面。

文章目录
- 1 安装 TensorFlow Serving
- 2 keras-H5格式转变为tensorflow-pb + 模型热更新
- 2.1 keras-H5格式转变为tensorflow-pb
- 2.2 热更新
- 3 启动tensorflow_model_server
- 4 测试 TensorFlow Serving 服务
- 5 为什么需要 Flask 服务
- 6 ts + flask 一键自动部署
- 7 flask + ts的测试
参考博客:Deploying Keras models using TensorFlow Serving and Flask 中文版:使用 TensorFlow Serving 和 Flask 部署 Keras 模型 github:keras-and-tensorflow-serving 官方教程: TensorFlow Serving
具体细节直接看教程,来看几个关键内容。
1 安装 TensorFlow Serving
有几种启动ts的方式,docker
也有tensorflow_model_server
,笔者觉得后者比较省力。
$ apt install curl $ echo "deb [arch=amd64] http://storage.googleapis.com/tensorflow-serving-apt stable tensorflow-model-server tensorflow-model-server-universal" | sudo tee /etc/apt/sources.list.d/tensorflow-serving.list && curl https://storage.googleapis.com/tensorflow-serving-apt/tensorflow-serving.release.pub.gpg | sudo apt-key add - $ apt-get update $ apt-get install tensorflow-model-server $ tensorflow_model_server --version TensorFlow ModelServer: 1.10.0-dev TensorFlow Library: 1.11.0 $ python --version Python 3.6.6
从github:keras-and-tensorflow-serving中把代码都拉下来以备后用。
其中,
(tensorflow) ubuntu@Himanshu:~/Desktop/Medium/keras-and-tensorflow-serving$ tree -c └── keras-and-tensorflow-serving ├── README.md ├── my_image_classifier │ └── 1 │ ├── saved_model.pb │ └── variables │ ├── variables.data-00000-of-00001 │ └── variables.index ├── test_images │ ├── car.jpg │ └── car.png ├── flask_server │ ├── app.py │ ├── flask_sample_request.py └── scripts ├── download_inceptionv3_model.py ├── inception.h5 ├── auto_cmd.py ├── export_saved_model.py ├── imagenet_class_index.json └── serving_sample_request.py 6 directories, 15 files
还有一种就是docker 部署的方式:
sudo nvidia-docker run -p 8500:8500 -v /home/projects/resnet/weights/:/models --name resnet50 -itd --entrypoint=tensorflow_model_server tensorflow/serving:2.0.0-gpu --port=8500 --per_process_gpu_memory_fraction=0.5 --enable_batching=true --model_name=resnet --model_base_path=/models/season &
参考:TensorFlow Serving + Docker + Tornado机器学习模型生产级快速部署
2 keras-H5格式转变为tensorflow-pb + 模型热更新
2.1 keras-H5格式转变为tensorflow-pb
详见 export_saved_model.py
import tensorflow as tf # The export path contains the name and the version of the model tf.keras.backend.set_learning_phase(0) # Ignore dropout at inference model = tf.keras.models.load_model('./inception.h5') export_path = '../my_image_classifier/1' # Fetch the Keras session and save the model # The signature definition is defined by the input and output tensors # And stored with the default serving key with tf.keras.backend.get_session() as sess: tf.saved_model.simple_save( sess, export_path, inputs={'input_image': model.input}, outputs={t.name: t for t in model.outputs})
其中,尤其要注意{'input_image': model.input}
,后面ts启动之后,输入给ts的内容要与这个相同。
如果你的tf版本是2.0以上,那么model.save()
的时候就可以直接选择格式save_format='tf'
:
from keras import backend as K from keras.models import load_model import tensorflow as tf # 首先使用tf.keras的load_model来导入模型h5文件 model_path = 'v7_resnet50_19-0.9068-0.8000.h5' model = tf.keras.models.load_model(model_path, custom_objects=dependencies) model.save('models/resnet/', save_format='tf') # 导出tf格式的模型文件
注意,这里要使用tf.keras.models.load_model
来导入模型,不能使用keras.models.load_model
,只有tf.keras.models.load_model
能导出成tfs所需的模型文件。 以往导出keras模型需要写一大段定义builder的代码,如文章《keras、tensorflow serving踩坑记》 的那样,现在只需使用简单的model.save就可以导出了。
2.2 热更新
TensorFlow Serving 支持热更新模型,其典型的模型文件夹结构如下:
/saved_model_files /1 # 版本号为1的模型文件 /assets /variables saved_model.pb ... /N # 版本号为N的模型文件 /assets /variables saved_model.pb
上面 1~N 的子文件夹代表不同版本号的模型。 当指定 –model_base_path 时,只需要指定根目录的 绝对地址 (不是相对地址)即可。 例如,如果上述文件夹结构存放在 home/snowkylin 文件夹内,则 –model_base_path 应当设置为 home/snowkylin/saved_model_files (不附带模型版本号)。 TensorFlow Serving 会自动选择版本号最大的模型进行载入。
我们可以这样做:
- 在新的 keras 模型上运行相同的脚本。
- 在 export_saved_model.py 中更新
export_path = ‘../my_image_classifier/1’
为export_path = ‘../my_image_classifier/2’
。
TensorFlow Serving 会自动检测出 my_image_classifier 目录下模型的新版本,并在服务器中更新它。
3 启动tensorflow_model_server
tensorflow_model_server --rest_api_port=端口号(如8501) --model_name=模型名 --model_base_path="SavedModel格式模型的文件夹绝对地址(不含版本号)"
文中的案例是图像分类:
tensorflow_model_server --model_base_path=/home/ubuntu/Desktop/Medium/keras-and-tensorflow-serving/my_image_classifier --rest_api_port=9000 --model_name=ImageClassifier
- –rest_api_port:TensorFlow Serving 会在 8500 端口启动一个 gRPC ModelServer,并且 RESET API 可在 9000 端口调用。
- –model_name:这是你用于发送 POST 请求的服务器的名称。你可以输入任何名称。
如果成功了之后:
2018-02-08 16:28:02.641662: I tensorflow_serving/model_servers/main.cc:149] Building single TensorFlow model file config: model_name: voice model_base_path: /home/yu/workspace/test/test_model/ 2018-02-08 16:28:02.641917: I tensorflow_serving/model_servers/server_core.cc:439] Adding/updating models. 2018-02-08 16:28:02.641976: I tensorflow_serving/model_servers/server_core.cc:490] (Re-)adding model: voice 2018-02-08 16:28:02.742740: I tensorflow_serving/core/basic_manager.cc:705] Successfully reserved resources to load servable {name: voice version: 1} 2018-02-08 16:28:02.742800: I tensorflow_serving/core/loader_harness.cc:66] Approving load for servable version {name: voice version: 1} 2018-02-08 16:28:02.742815: I tensorflow_serving/core/loader_harness.cc:74] Loading servable version {name: voice version: 1} 2018-02-08 16:28:02.742867: I external/org_tensorflow/tensorflow/contrib/session_bundle/bundle_shim.cc:360] Attempting to load native SavedModelBundle in bundle-shim from: /home/yu/workspace/test/test_model/1 2018-02-08 16:28:02.742906: I external/org_tensorflow/tensorflow/cc/saved_model/loader.cc:236] Loading SavedModel from: /home/yu/workspace/test/test_model/1 2018-02-08 16:28:02.755299: I external/org_tensorflow/tensorflow/core/platform/cpu_feature_guard.cc:137] Your CPU supports instructions that this TensorFlow binary was not compiled to use: AVX2 FMA 2018-02-08 16:28:02.795329: I external/org_tensorflow/tensorflow/cc/saved_model/loader.cc:155] Restoring SavedModel bundle. 2018-02-08 16:28:02.820146: I external/org_tensorflow/tensorflow/cc/saved_model/loader.cc:190] Running LegacyInitOp on SavedModel bundle. 2018-02-08 16:28:02.832832: I external/org_tensorflow/tensorflow/cc/saved_model/loader.cc:284] Loading SavedModel: success. Took 89481 microseconds. 2018-02-08 16:28:02.834804: I tensorflow_serving/core/loader_harness.cc:86] Successfully loaded servable version {name: voice version: 1} 2018-02-08 16:28:02.836855: I tensorflow_serving/model_servers/main.cc:290] Running ModelServer at 0.0.0.0:8500 ...
4 测试 TensorFlow Serving 服务

脚本 serving_sample_request.py
向 TensorFlow Serving 服务发送一个 POST 请求。
其中, 服务器 URI: http://服务器地址:端口号/v1/models/模型名:predict
请求内容:
{ "signature_name": "需要调用的函数签名(Sequential模式不需要)", "instances": 输入数据 }
回复为:
{ "predictions": 返回值 }
import argparse import json import numpy as np import requests from keras.applications import inception_v3 from keras.preprocessing import image # Argument parser for giving input image_path from command line # ap = argparse.ArgumentParser() # ap.add_argument("-i", "--image", required=True, # help="path of the image") # args = vars(ap.parse_args()) image_path = 'test_images/car.png' # Preprocessing our input image img = image.img_to_array(image.load_img(image_path, target_size=(224, 224))) / 255. # this line is added because of a bug in tf_serving(1.10.0-dev) img = img.astype('float16') payload = { "instances": [{'input_image': img.tolist()}] } # sending post request to TensorFlow Serving server r = requests.post('http://localhost:9000/v1/models/ImageClassifier:predict', json=payload) pred = json.loads(r.content.decode('utf-8')) # Decoding the response # decode_predictions(preds, top=5) by default gives top 5 results # You can pass "top=10" to get top 10 predicitons print(json.dumps(inception_v3.decode_predictions(np.array(pred['predictions']))[0]))
输出的结果为:
Downloading data from https://storage.googleapis.com/download.tensorflow.org/data/imagenet_class_index.json 40960/35363 [==================================] - 1s 20us/step [["n04285008", "sports_car", 0.998413682], ["n04037443", "racer", 0.00140099635], ["n03459775", "grille", 0.000160793832], ["n02974003", "car_wheel", 9.57861539e-06], ["n03100240", "convertible", 6.01583724e-06]]
5 为什么需要 Flask 服务
这里只是截取一下ts + flask
联合使用的好处。
如你所见,我们已经在 serving_sample_request.py (前端调用者)执行了一些图像预处理步骤。以下是在 TensorFlow serving 服务层之上创建 Flask 服务的原因:
- 当我们向前端团队提供 API 时,我们需要确保他们不被预处理的技术细节淹没。
- 我们可能并不总是有 Python 后段服务器(比如:node.js 服务器),因此使用 numpy 和 keras 库进行预处理可能会很麻烦。
- 如果我们打算提供多个模型,那么我们不得不创建多个 TensorFlow Serving 服务并且在前端代码添加新的 URL。但 Flask 服务会保持域 URL 相同,而我们只需要添加一个新的路由(一个函数)。
- 可以在 Flask 应用中执行基于订阅的访问、异常处理和其他任务。

Flask 服务只需要一个flask_server/app.py
文件。
import base64 import json from io import BytesIO import numpy as np import requests from flask import Flask, request, jsonify from keras.applications import inception_v3 from keras.preprocessing import image # from flask_cors import CORS app = Flask(__name__) # Uncomment this line if you are making a Cross domain request # CORS(app) # Testing URL @app.route('/hello/', methods=['GET', 'POST']) def hello_world(): return 'Hello, World!' @app.route('/imageclassifier/predict/', methods=['POST']) def image_classifier(): # Decoding and pre-processing base64 image img = image.img_to_array(image.load_img(BytesIO(base64.b64decode(request.form['b64'])), target_size=(224, 224))) / 255. # this line is added because of a bug in tf_serving(1.10.0-dev) img = img.astype('float16') # Creating payload for TensorFlow serving request payload = { "instances": [{'input_image': img.tolist()}] } # Making POST request r = requests.post('http://localhost:9000/v1/models/ImageClassifier:predict', json=payload) # Decoding results from TensorFlow Serving server pred = json.loads(r.content.decode('utf-8')) # Returning JSON response to the frontend return jsonify(inception_v3.decode_predictions(np.array(pred['predictions']))[0])
6 ts + flask 一键自动部署
auto_cmd.py
是一个用于自动启动和停止这两个服务(TensorFlow Serving 和 Falsk)的脚本。你可以修改这个脚本适用两个以上的服务。
import os import signal import subprocess # Making sure to use virtual environment libraries activate_this = "/home/ubuntu/tensorflow/bin/activate_this.py" exec(open(activate_this).read(), dict(__file__=activate_this)) # Change directory to where your Flask's app.py is present os.chdir("/home/ubuntu/Desktop/Medium/keras-and-tensorflow-serving/flask_server") tf_ic_server = "" flask_server = "" try: tf_ic_server = subprocess.Popen(["tensorflow_model_server " "--model_base_path=/home/ubuntu/Desktop/Medium/keras-and-tensorflow-serving/my_image_classifier " "--rest_api_port=9000 --model_name=ImageClassifier"], stdout=subprocess.DEVNULL, shell=True, preexec_fn=os.setsid) print("Started TensorFlow Serving ImageClassifier server!") flask_server = subprocess.Popen(["export FLASK_ENV=development && flask run --host=0.0.0.0"], stdout=subprocess.DEVNULL, shell=True, preexec_fn=os.setsid) print("Started Flask server!") while True: print("Type 'exit' and press 'enter' OR press CTRL+C to quit: ") in_str = input().strip().lower() if in_str == 'q' or in_str == 'exit': print('Shutting down all servers...') os.killpg(os.getpgid(tf_ic_server.pid), signal.SIGTERM) os.killpg(os.getpgid(flask_server.pid), signal.SIGTERM) print('Servers successfully shutdown!') break else: continue except KeyboardInterrupt: print('Shutting down all servers...') os.killpg(os.getpgid(tf_ic_server.pid), signal.SIGTERM) os.killpg(os.getpgid(flask_server.pid), signal.SIGTERM) print('Servers successfully shutdown!')
第 10 行中的路径使其指向你的 app.py 所在目录。你可能还需要修改第 6 行使其指向你的虚拟环境的 bin。
7 flask + ts的测试
# importing the requests library import argparse import base64 import requests # defining the api-endpoint API_ENDPOINT = "http://localhost:5000/imageclassifier/predict/" # taking input image via command line ap = argparse.ArgumentParser() ap.add_argument("-i", "--image", required=True, help="path of the image") args = vars(ap.parse_args()) image_path = args['image'] b64_image = "" # Encoding the JPG,PNG,etc. image to base64 format with open(image_path, "rb") as imageFile: b64_image = base64.b64encode(imageFile.read()) # data to be sent to api data = {'b64': b64_image} # sending post request and saving response as response object r = requests.post(url=API_ENDPOINT, data=data) # extracting the response print("{}".format(r.text))
输出:
$ python flask_sample_request.py -i ../test_images/car.png [ [ "n04285008", "sports_car", 0.998414 ], [ "n04037443", "racer", 0.00140099 ], [ "n03459775", "grille", 0.000160794 ], [ "n02974003", "car_wheel", 9.57862e-06 ], [ "n03100240", "convertible", 6.01581e-06 ] ]
如果需要处理跨域 HTTP 请求,需要在 app.py 中启用 Flask-CORS。