知乎一條龍第二彈，API 部署開放、H5線上展示與源碼共享

2019 年 12 月 31 日
筆記

作者：周蘿蔔

來源：蘿蔔大雜燴

前面寫了一個知乎爬蟲、API 和小程式一條龍第一彈，反響還不錯，於是在這些天的空閑時間裡，我又優化下程式碼，並且把服務部署到了雲伺服器上，開放了 API 供需要的小夥伴使用。

也有很多人要源程式碼看看，想自己動手實踐下，今天就把程式碼放出來，寫的不好，僅供參考，也歡迎一起討論維護！

功能增強之token

因為準備開放 API 介面出來，所以考慮了下，還是做一些簡單的驗證，畢竟安全措施做的好，你好我也好！

首先我們先來看下整體的請求流程

客戶端先通過 getToken 介面來獲取一個具有時間期限的 token 資訊，然後再攜帶該 token 資訊訪問對應的數據介面

token 實現

我這裡使用第三方庫 itsdangerous 來做 token 簽名

from itsdangerous import TimedJSONWebSignatureSerializer as Serializer

itsdangerous 提供了多種生成簽名令牌的方式，我這裡選擇的 TimedJSONWebSignatureSerializer 可以生成一種具有過期時間的 JSON Web 簽名，這樣我們也就可以控制我們所簽發的 token 是具有時效性的。

生成簽名並加密成 token

access_token_gen = Serializer(secret_key=secret_key, salt=salt, expires_in=access_token_expires_in)  timtstamp = time.time()  access_token = access_token_gen.dumps({          "userid": userid,          "iat": timtstamp      })

然後在需要解析 token 時，只要調用 loads 即可

s = Serializer(secret_key=secret_key, salt=salt)  data = s.loads(token)

訪問限制裝飾器

裝飾器是 Python 語言的一大利器，我們當然要好好利用起來了。

在最開始的設計中，我們的路由都是可以直接訪問的，沒有任何限制

@api.route('/api/zhihu/hot/', methods=['GET', 'POST'])  def zhihu_api_data():      pass

現在我們想達到一種效果，就是不改變當前視圖函數的寫法，還要增加訪問限制，只有攜帶了正確 token 的請求才能夠正確訪問對應的路由

@api.route('/api/zhihu/hot/', methods=['GET', 'POST'])  @token.tokenRequired  def zhihu_api_data():      pass

毫無疑問，這個功能交給裝飾器真是再好不過了

def tokenRequired(f):      @wraps(f)      def decorated_function(*args, **kwargs):          pass      return decorated_function

下面的工作就是編寫 decorated_function 函數的內容了，只需要加上我們需要的判斷即可

if request.method == 'POST':      post_data = json.loads(request.data)      if 'token' in post_data and 'secret' in post_data and post_data['secret'] == '周蘿蔔真帥':          token = post_data['token']          check_result = check_token(token)          if check_result is True:              return f(*args, **kwargs)          else:              return jsonify(check_result), 401      return jsonify({'code': 422, 'message': '按套路出牌啊'}), 422

當請求方法是 POST 時，如果 token 欄位不在請求體內或者請求體的 secret 欄位沒有按照套路出牌的話，都會返回錯誤響應的（這裡請牢記暗號啊，誇我就對了！）

接下來我們再看看 check_token 函數，這就是具體的校驗 token 的方法了

def check_token(token):      token_list = []      if rd.keys("token*"):          for t in rd.keys("token*"):              token_list.append(rd.get(t))      if token in token_list:          return {'code': 401, 'message': 'token is blocked'}, 401      validator = validateToken(token)      if validator['code'] != 200:          if validator['message'] == 'toekn expired':              return validator          else:              return validator      elif validator['code'] == 200:          return True

留用了 block token 的功能，以便後面使用。而 validateToken 函數就是調用 loads 方法解析加密後的 token。

功能增強之頻率限制

所謂的頻率限制，就是在指定的時間之內，訪問請求的次數不能超過多少次。我這裡設置的是一分鐘之內，訪問次數不能超過20次

REQUEST_NUM = 20

為了實現這個功能，我們需要用到 Flask 程式的全局請求鉤子 before_app_request。該鉤子的作用就是在任何請求發生之前，都會先調用該函數。這樣我們就可以添加自己的判斷邏輯，增加訪問頻率限制

@main.before_app_request  def before_request():      remote_add = request.remote_addr      rd_add = rd.get('access_num_%s' % remote_add)      if rd_add:          if int(rd_add) <= Config.REQUEST_NUM:              rd.incr('access_num_%s' % remote_add)          else:              return jsonify({'code': 422, 'message': '訪問太頻繁啦！'}), 422      else:          rd.set('access_num_%s' % remote_add, 1, ex=60)

每個 IP 的訪問頻率都存儲在 redis 中，且該 redis key 的過期時間為60秒。當然這種限制屬於防君子不防小人的做法，為什麼這麼說呢，因為如果你想突破這種入門級的限制，實在是太 easy 啦，而且使用手機4G網路的請求，IP 地址還會不停變化，太楠啦！

功能增強之高頻辭彙

在上一次的文章中，我們在前端（小程式端）只展示了知乎熱點隨著時間的走勢情況，今天再加上每個熱點的回答中的高頻辭彙，通過 jieba 來分詞，還是很容易實現的。

將獲取到的回答內容分詞並統計詞頻

def cut_word(word):      word = re.sub('[a-zA-Z0-9]', '', word)      empty_str = ' '      with open(stopwords_path, encoding='utf-8') as f:          stop_words = f.read()      stop_words = stop_words + empty_str      counts = {}      txt = jieba.lcut(word)      for w in txt:          if w not in stop_words:              counts[w] = counts.get(w, 0) + 1      sort_counts = sorted(counts.items(), key=lambda item: item[1], reverse=True)        return sort_counts[:20]

在這裡我們去掉了英文和數字，並且返回了詞頻前20的數據

然後我們修改視圖函數 zhihu_api_detail

@api.route('/api/zhihu/detail/<id>/', methods=['GET', 'POST'])  @token.tokenRequired  def zhihu_api_detail(id):      zhihu_detail = zhihudetail(id)      redis_word = rd.get('wordcloud_%s' %id)      redis_content = rd.get('content_%s' % id)      if redis_word:          count_list = json.loads(redis_word)          content_list = json.loads(redis_content)      else:          count_list = []          count_word, content_list = zhihucontent(id)  # 獲取回答的詞頻數據和回答內容          for count in count_word:              count_list.append({'name': count[0], 'textSize': count[1]})          rd.set('wordcloud_%s' %id, json.dumps(count_list), ex=604800)          rd.set('content_%s' %id, json.dumps(content_list), ex=604800)        if count_list[0]['textSize'] < 10:          for i in count_list:              i['textSize'] = i['textSize']*10      elif count_list[0]['textSize'] > 200:          for i in count_list:              i['textSize'] = i['textSize']/10        return jsonify({'code': 0, 'data': zhihu_detail, 'count_word': count_list, 'content': content_list}), 200

因為每次使用 jieba 分詞時還是比較耗費時間的，所以這裡把處理好的數據保存到 redis 中，下次再請求時直接拿數據即可。

現在我們的詳情頁面展示如下

部署 API

最後我們把已經完成的程式碼部署到雲伺服器上，使用的還是那套 Nginx + Gunicorn + Flask + MySQL

配置詳情

Nginx 配置

server {      gzip on;      listen       443;      server_name  www.luobodazahui.top;      ssl on;      root        /home/mini/mini/      ;      ssl_certificate  cert/luobodazahui.top.crt;      ssl_certificate_key cert/luobodazahui.top.key;      ssl_session_timeout 5m;      ssl_ciphers ECDHE-RSA-AES128-GCM-SHA256:ECDHE:ECDH:AES:HIGH:!NULL:!aNULL:!MD5:!ADH:!RC4;      ssl_protocols TLSv1 TLSv1.1 TLSv1.2;      ssl_prefer_server_ciphers on;      location / {          proxy_pass       http://127.0.0.1:5002;          proxy_set_header Host $host;          proxy_set_header X-Real-IP $remote_addr;          proxy_set_header X-Forwarded-For $proxy_add_x_forwarded_for;          index  index.html index.htm;      }        proxy_set_header X-Real-IP $remote_addr;    }  server {      listen 80;      server_name luobodazahui.top;      rewrite ^(.*)$ https://$host$1 permanent;      }

因為 API 後面想給小程式使用，所以應用了域名 + HTTPS

Gunicorn 配置

#from gevent import monkey  #monkey.patch_all()    import multiprocessing    #debug = True  loglevel = 'debug'  bind = '127.0.0.1:5002'  #bind = '0.0.0.0:5000'  #pidfile = 'pid/gunicorn.pid'  accesslog = '/home/mini/mini/log/ser_access.log'  errorlog = '/home/mini/mini/log/ser_error.log'    workers = 1  #workers = multiprocessing.cpu_count() * 2 + 1  worker_class = 'sync'  #reload = True

同樣是比較簡單的配置，列印了訪問和錯誤日誌，還啟用了適量的 workers。

啟動腳本 run.sh

/root/miniconda3/bin/gunicorn -D -c /home/mini/mini/gunicorn manage:app

停止腳本 stop.sh

kill -9 $(ps -ef | grep '/home/mini/mini/gunicorn' | grep -v grep | awk '{print $2}') 2>&1 >/dev/null;echo 0

API 資訊

我們來看下當前提供的 API 資訊

API地址	請求參數	支援方法‍‍‍
https://www.luobodazahui.top/api/auth/token/	table1	POST/GET
https://www.luobodazahui.top/api/zhihu/hot/	table2	POST/GET
https://www.luobodazahui.top/api/zhihu/detail//	table3	POST/GET

table1

{      "username": "admin",      "pwd": "admin"  }

請求示例

table2

{      "token":"eyJhbGciOiJIUzUxMiIsImlhdCI6MTU3NzI0NDE4MywiZXhwIjoxNTc3MjQ1OTgzfQ.eyJ1c2VyaWQiOjEsImlhdCI6MTU3NzI0NDE4My4zMjcwNjY0fQ.FptYNm0KnA8b4G_zcRJn9POrOgkiZxpvfBbzQqxoTTt7q96WeMo7Y6xCLL_oS4ksBP8jMztqopDRRqScXPKowg",      "secret":"周蘿蔔真帥"}

請求示例

table3

{      "token":"eyJhbGciOiJIUzUxMiIsImlhdCI6MTU3NzI0NDE4MywiZXhwIjoxNTc3MjQ1OTgzfQ.eyJ1c2VyaWQiOjEsImlhdCI6MTU3NzI0NDE4My4zMjcwNjY0fQ.FptYNm0KnA8b4G_zcRJn9POrOgkiZxpvfBbzQqxoTTt7q96WeMo7Y6xCLL_oS4ksBP8jMztqopDRRqScXPKowg",      "secret":"周蘿蔔真帥"}

請求示例

未來優化

完善日誌：當前只在定時任務當中加了日誌，其餘功能都未列印日誌，後續把日誌優化進來，方便問題定位
介面完善：當前介面返回數據龐雜，後續將介面拆分，增加更多參數，比如按照時間請求等
其他數據：後續增加微博、金融，票房等相關數據介面和展示

最後給出程式碼地址：https://github.com/zhouwei713/Mini_Flask