TKE：NodePort, Service, LB直通Pod性能測試對比

2020 年 2 月 14 日
筆記

1. 測試背景：

目前基於k8s 服務的外網訪問方式有以下幾種：

NodePort
svc(通過k8s 的clusterip 訪問)
自研 LB -> Pod （比如pod ip 作為 nginx 的 upstream, 或者社區的nginx-ingress）

其中第一種和第二種方案都要經過iptables 轉發，第三種方案不經過iptables，本測試主要是為了測試這三種方案的性能損耗。

2. 測試方案

為了做到測試的準確性和全面性，我們提供以下測試工具和測試數據：

2核4G 的Pod
5個Node 的4核8G 集群
16核32G 的Nginx 作為統一的LB
一個測試應用，2個靜態測試介面，分別對用不同大小的數據包（4k 和 100K）
測試1個pod ，10個pod的情況（service/pod 越多，一個機器上的iptables 規則數就越多，關於iptables規則數對轉發性能的影響，在「ipvs和iptables模式下性能對⽐比測試報告」已有結論： Iptables場景下，對應service在總數為2000內時，每個service 兩個pod, 性能沒有明顯下降。當service總數達到3000、4000時，性能下降明顯，service個數越多，性能越差。）所以這裡就不考慮pod數太多的情況。
單獨的16核32G 機器作作為壓力機，使用wrk 作為壓測工具, qps 作為評估標準，
那麼每種訪問方式對應以下4種情況

測試用例	Pod 數	數據包大小
1	1	4k
2	1	100K
3	10	4k
4	10	100k

每種情況測試5次，取平均值（qps），完善上表。

3. 測試過程

準備一個測試應用（基於nginx），提供兩個靜態文件介面，分別返回4k的數據和100K 的數據。鏡像地址：ccr.ccs.tencentyun.com/caryguo/nginx:v0.1 介面：http://0.0.0.0/4k.html http://0.0.0.0/100k.htm
部署壓測工具。https://github.com/wg/wrk
部署集群，5台Node來調度測試Pod, 10.0.4.6 這台用來獨部署Nginx, 作為統一的LB, 將這台機器加入集群的目的是為了將ClusterIP 作為nginx 的upstream .

root@VM-4-6-ubuntu:/etc/nginx# kubectl get node  NAME        STATUS                     ROLES     AGE       VERSION  10.0.4.12   Ready                      <none>    3d        v1.10.5-qcloud-rev1  10.0.4.3    Ready                      <none>    3d        v1.10.5-qcloud-rev1  10.0.4.5    Ready                      <none>    3d        v1.10.5-qcloud-rev1  10.0.4.6    Ready,SchedulingDisabled   <none>    12m       v1.10.5-qcloud-rev1  10.0.4.7    Ready                      <none>    3d        v1.10.5-qcloud-rev1  10.0.4.9    Ready                      <none>    3d        v1.10.5-qcloud-rev1

根據不同的測試場景，調整Nginx 的upstream, 根據不同的Pod, 調整壓力，讓請求的超時率控制在萬分之一以內, 數據如下：

./wrk -c 200 -d 20 -t 10 http://carytest.pod.com/10k.html   單pod  ./wrk -c 1000 -d 20 -t 100 http://carytest.pod.com/4k.html  10 pod

測試wrk -> nginx -> Pod 場景，

測試用例	Pod 數	數據包大小	平均QPS
1	1	4k	12498
2	1	100K	2037
3	10	4k	82752
4	10	100k	7743

wrk -> nginx -> ClusterIP -> Pod

測試用例	Pod 數	數據包大小	平均QPS
1	1	4k	12568
2	1	100K	2040
3	10	4k	81752
4	10	100k	7824

NodePort 場景，wrk -> nginx -> NodePort -> Pod

測試用例	Pod 數	數據包大小	平均QPS
1	1	4k	12332
2	1	100K	2028
3	10	4k	76973
4	10	100k	5676

壓測過程中，4k 數據包的情況下，應用的負載都在80% -100% 之間， 100k 情況下，應用的負載都在20%-30%

之間，壓力都在網路消耗上，沒有到達服務後端。

4. 測試結論

在一個pod 的情況下（4k 或者100 數據包），3中網路方案差別不大，QPS 差距在3% 以內。
在10個pod，4k 數據包情況下，lb->pod 和 svc 差距不大，NodePort 損失近7% 左右。
10個Pod, 100k 數據包的情況下，lb->pod 和 svc 差距不大，NodePort 損失近 25%

5. 附錄

nginx 配置

user nginx;  worker_processes 50;  error_log /var/log/nginx/error.log;  pid /run/nginx.pid;    # Load dynamic modules. See /usr/share/nginx/README.dynamic.  include /usr/share/nginx/modules/*.conf;    events {      worker_connections 100000;  }    http {      log_format  main  '$remote_addr - $remote_user [$time_local] "$request" '                        '$status $body_bytes_sent "$http_referer" '                        '"$http_user_agent" "$http_x_forwarded_for"';        access_log  /var/log/nginx/access.log  main;        sendfile            on;      tcp_nopush          on;      tcp_nodelay         on;      keepalive_timeout   65;      types_hash_max_size 2048;        include             /etc/nginx/mime.types;      default_type        application/octet-stream;        # Load modular configuration files from the /etc/nginx/conf.d directory.      # See http://nginx.org/en/docs/ngx_core_module.html#include      # for more information.      include /etc/nginx/conf.d/*.conf;         # pod ip      upstream  panda-pod {            #ip_hash;            # Pod ip            #server   10.0.4.12:30734  max_fails=2 fail_timeout=30s;            #server   172.16.1.5:80  max_fails=2 fail_timeout=30s;            #server   172.16.2.3:80  max_fails=2 fail_timeout=30s;            #server   172.16.3.5:80  max_fails=2 fail_timeout=30s;            #server   172.16.4.6:80  max_fails=2 fail_timeout=30s;            #server   172.16.4.5:80  max_fails=2 fail_timeout=30s;            #server   172.16.3.6:80  max_fails=2 fail_timeout=30s;            #server   172.16.1.4:80  max_fails=2 fail_timeout=30s;            #server   172.16.0.7:80  max_fails=2 fail_timeout=30s;            #server   172.16.0.6:80  max_fails=2 fail_timeout=30s;            #server   172.16.2.2:80  max_fails=2 fail_timeout=30s;              # svc ip            #server   172.16.255.121:80  max_fails=2 fail_timeout=30s;              # NodePort            server   10.0.4.12:30734   max_fails=2 fail_timeout=30s;            server   10.0.4.3:30734    max_fails=2 fail_timeout=30s;            server   10.0.4.5:30734    max_fails=2 fail_timeout=30s;            server   10.0.4.7:30734    max_fails=2 fail_timeout=30s;            server   10.0.4.9:30734    max_fails=2 fail_timeout=30s;              keepalive 256;      }        server {          listen       80;          server_name  carytest.pod.com;          # root   /usr/share/nginx/html;          charset utf-8;            # Load configuration files for the default server block.          include /etc/nginx/default.d/*.conf;          location / {                      proxy_pass        http://panda-pod;                      proxy_http_version 1.1;                      proxy_set_header Connection "";                      proxy_redirect off;                      proxy_set_header  Host  $host;                      proxy_set_header  X-Real-IP  $remote_addr;                      proxy_set_header  X-Forwarded-For  $proxy_add_x_forwarded_for;                      proxy_next_upstream error timeout invalid_header http_500 http_502 http_503 http_504;            }            error_page 404 /404.html;              location = /40x.html {          }            error_page 500 502 503 504 /50x.html;              location = /50x.html {          }      }

TKE：NodePort, Service, LB直通Pod性能測試對比

1. 測試背景：

2. 測試方案

3. 測試過程

4. 測試結論

5. 附錄

VirMach 便宜 VPS

QNews

TKE：NodePort, Service, LB直通Pod性能測試對比

1. 測試背景：

2. 測試方案

3. 測試過程

4. 測試結論

5. 附錄

分享此文：

Related Posts

數據流分析軟體SQLFlow的工作原理

webpack4.X核心工具庫之tapable實例對象Hook

如何把SAP CRM WebClient UI上某個欄位高亮加粗顯示

K8s 流量複製方案

VirMach 便宜 VPS

QNews

熱門搜尋