Extend Kubernetes – FlexVolume And CSI

簡介

FlexVolume/CSI 是什麼

kubernetes 的 volume 解決的 存儲 state 的問題。State 有很多種存儲方式,kubernetes 只關注其中的一部分

In socpe (Posix/SCSI)

Out of scope

文件存儲 (nfs, smb)

對象存儲 (s3, gcs, cos)

塊存儲 (cephrbd, aws ebs)

SQL,NOSQL,TSDB

塊存儲上的文件

Pub-Sub System (kafka, aws sns)

volume plugins 又可以分成幾類

類別

例子

Remote Storage

awsElasticBlockStore,azureDisk,azureFile,cephfs,cinder….

Ephemeral Storage

EmptyDir; Secret, ConfigMap, DownwardAPI

Local

HostPath, Local PV

Out-Of-Tree

Flex Volume, CSI, Other

  • Volume 的操作一般都是 Attach volume to node -> Mount volume to pod
  • 這篇文章關注 Out-Of-Tree 的 FlexVolume/CSI

FlexVolume/CSI 位於什麼位置

image
image.png

對於 VolumeManager 來講, FlexVolume/CSI plugin 只是插件的一種,使用起來並無區別.

FlexVolume/CSI 對比

  • FlexVolume: 二進位實現,類似 CNI,簡單直接,但是存在一個問題是 可能部署會比較麻煩,部署的二進位可能還有依賴
  • CSI: 設計複雜通用,類似 CNI/CRI 目標不僅僅是提供給 kubernetes 使用, 但是實現/部署更為複雜.
  • 未來所有的 storage plugin 的實現都是推薦用 Out-of-tree CSI driver 的方式實現, 已有的 flexvolume 模式會繼續維護,無需遷移,但是也提供的遷移方式: flexadapter.

FlexVolume/CSI Plugin

K8s掛載卷的基本過程(涉及的組件):

image
  1. 用戶創建Pod包含一個PVC
  2. Pod被分配到節點NodeA
  3. Kubelet等待Volume Manager準備設備
  4. PV controller調用相應Volume Plugin(in-tree或者out-of-tree)創建持久化卷並在系統中創建 PV對象以及其與PVC的綁定(Provision)
  5. Attach/Detach controller或者Volume Manager通過Volume Plugin實現塊設備掛載(Attach)
  6. Volume Manager等待設備掛載完成,將卷掛載到節點指定目錄(mount)
  7. /var/lib/kubelet/plugins/kubernetes.io/aws-ebs/mounts/vol-xxxxxxxxxxxxxxxxx
  8. Kubelet在被告知設備準備好後啟動Pod中的容器,利用Docker –v等參數將已經掛載到本地 的卷映射到容器中(volume mapping)

FlexVolume

FlexVolume 的執行流程

image

協議和實現

  • 需要提前在 node 上部署插件二進位
  • 二進位需要實現一下方法,掛載參數等 會以json 參數的形式傳遞給二進位
    • init: Called during Kubelet & Controller manager initialization.
    • attach: Controller Manager 調用, Attach the volume specified by the given spec on the given node.
    • detach: Controller Manager 調用, Detach the volume from the node
    • waitforattach: Controller Manager 調用, Wait for the volume to be attached on the remote node
    • isattached: Controller Manager 調用, Check the volume is attached on the node
    • (un)mountdevice: Kubelet 調用, Mount device mounts the device to a global path which individual pods can then bind mount.
    • (un)mount: Kubelet 調用, Mount the volume at the mount dir.

CSI

CSI 的執行流程

正在上傳圖片…

image

部署方式

  • StatefuelSet:副本數為 1 保證只有一個實例運行,它包含三個容器
    • 用戶實現的 CSI Driver 插件
    • External Attacher:Kubernetes 提供的 sidecar 容器,它監聽 VolumeAttachment 和 PersistentVolume 對象的變化情況,並調用 CSI 插件的 ControllerPublishVolume 和 ControllerUnpublishVolume 等 API 將 Volume 掛載或卸載到指定的 Node 上
    • External Provisioner:Kubernetes 提供的 sidecar 容器,它監聽 PersistentVolumeClaim 對象的變化情況,並調用 CSI 插件的 ControllerPublish 和 ControllerUnpublish 等 API 管理 Volume
  • Daemonset:將 CSI 插件運行在每個 Node 上,以便 Kubelet 可以調用。它包含 2 個容器
    • 用戶實現的 CSI Driver 插件
    • Driver Registrar:註冊 CSI 插件到 kubelet 中,並初始化 NodeId(即給 Node 對象增加一個 Annotation csi.volume.kubernetes.io/nodeid)
image.png

CSI 生命周期,這裡是其中一個 Dynamically Provisioned Volume + block device 的例子,更多完整的例子 參考

   CreateVolume +------------+ DeleteVolume   +------------->|  CREATED   +--------------+   |              +---+----+---+              |   |       Controller |    | Controller       v  +++         Publish |    | Unpublish       +++  |X|          Volume |    | Volume          | |  +-+             +---v----+---+             +-+                  | NODE_READY |                  +---+----^---+                 Node |    | Node                Stage |    | Unstage               Volume |    | Volume                  +---v----+---+                  |  VOL_READY |                  +------------+                 Node |    | Node              Publish |    | Unpublish               Volume |    | Volume                  +---v----+---+                  | PUBLISHED  |                  +------------+    Figure 6: The lifecycle of a dynamically provisioned volume, from  creation to destruction, when the Node Plugin advertises the  STAGE_UNSTAGE_VOLUME capability.

協議和實現

SPEC

proto

Service

作用

方法

Identity

用於 Kubernetes 與 CSI 插件協調版本資訊

GetPluginInfo/GetPluginCapabilities/Probe

Controller

用於創建、刪除以及管理 Volume 存儲卷

Create/DeleteVolume, ControllerPublish/UnPublishVolume, ListVolumes, ListVolumes, CreateSnapshot, ControllerExpandVolume….

Node

用於將 Volume 存儲卷掛載到指定的目錄中以便 Kubelet 創建容器時使用(需要監聽在 /var/lib/kubelet/plugins/SanitizedCSIDriverName/csi.sock

NodeStage/UnStageVolume, NodePublish/UnPublishVolume, NodeExpandVolume…

其他方法可以由 kubernetes team 實現的 sidecar 搭配使用

常見 FlexVolume/CSI plugin 實現

FlexVolume Plugins

參考

CSI Plugins

這裡有一些簡單的例子, ,大部分只實現了 nodeService 部分

這裡我們可以看一下 gcp-compute-persistent-disk-csi-driver 的實現, 這個庫實現得比較完整

  • GCEIdentityServer
  • GCENodeServer
    • Node(Un)PublishVolume -> Mount
    • Node(Un)StageVolume -> MountAndFormat
    • NodeExpandVolume -> Resizefs
  • GCEControllerServer
    • Create/DeleteVolume -> Call CloudProvider to create Volume
    • Controller(Un)PublishVolume -> Call CloudProvider to attach Volume
    • CreateSnapshot -> Call CloudProvider create snapshot for volume
    • ControllerExpandVolume -> Call CloudProvider to resize disk

實踐

實現一個 FlexVolume plugin

這個給出了一個例子 利用 cosfs 實現 flex volume 的例子

#!/usr/bin/env bash    # Notes:  #  - Please install "jq" package before using this driver.  #  - Please install "cosfs > 1.5.0" package before using this driver.      # warning: do not edit this line, this may be replace when deploy.sh  DEBUG_FLEX_COS="${DEBUG_FLEX_COS:-false}"    usage() {  	err "Invalid usage. Usage: "  	err "t$0 init"  	err "t$0 mount <mount dir> <json params>"  	err "t$0 unmount <mount dir>"  	exit 1  }    logtofile() {  	echo [`date`] $* >> /var/log/flexcos.log  }    err() {  	echo -ne $* 1>&2  }    log() {  	echo -ne $* >&1  }    ismounted() {  	MOUNT=`findmnt -n ${MNTPATH} 2>/dev/null | cut -d' ' -f1`  	if [ "${MOUNT}" == "${MNTPATH}" ]; then  		echo "1"  	else  		echo "0"  	fi  }    domount() {  	MNTPATH=$1  	APPID=$(echo $2|jq -r '.["appid"]')  	BUCKET=$(echo $2|jq -r '.["bucket"]')  	REMOTE=$(echo $2|jq -r '.["remote"]')  	DIR=$(echo $2|jq -r '.["dir"]')  	SECRETID=$(echo $2|jq -r '.["secretid"]')  	SECRETKEY=$(echo $2|jq -r '.["secretkey"]')  	DEBUGLEVEL="${DEBUGLEVEL:-info}"      	if [[ "$DIR" != "null" ]] && [[ "$DIR" != "" ]];then  		if [[ "$DIR" != /* ]];then  			DIR=/${DIR}  		fi  	else  		DIR="/"  	fi    	DIR=:${DIR}        if [ $(ismounted) -eq 1 ] ; then  		message='{"status": "Success"}'  		log $message  		logtofile "${APPID}:${BUCKET} already mounted, ${message}"  		exit 0  	fi    	mkdir -p ${MNTPATH} &> /dev/null  	mkdir -p /data/cache/${MNTPATH}/cos  	echo "${BUCKET}-${APPID}:${SECRETID}:${SECRETKEY}" > /data/cache/${MNTPATH}/passwd      chmod 600 /data/cache/${MNTPATH}/passwd    	if [ "$DEBUG_FLEX_COS" == "true" ];then  	    logtofile "cosfs ${BUCKET}-${APPID}${DIR} ${MNTPATH} -ourl=${REMOTE} -odbglevel=${DEBUGLEVEL} -oallow_other -ouse_cache=/data/cache/${MNTPATH}/cos -odel_cache -oensure_diskfree=5000 -opasswd_file=/data/cache/${MNTPATH}/passwd"  	fi  	out=$(cosfs ${BUCKET}-${APPID}${DIR} ${MNTPATH}    			-ourl=${REMOTE}     			-odbglevel=${DEBUGLEVEL}    			-oallow_other    			-ouse_cache=/data/cache/${MNTPATH}/cos   			-odel_cache   			-oensure_diskfree=5000   			-opasswd_file=/data/cache/${MNTPATH}/passwd 2>&1)    	if [ $? -ne 0 ]; then  		message="{ "status": "Failure", "message": "Failed to mount ${APPID}:${BUCKET} at ${MNTPATH}"}"  		err ${message}  		logtofile ${message}${out}  		exit 1  	fi    	message='{"status": "Success"}'  	log ${message}  	logtofile "${APPID}:${BUCKET} mounted, ${message}"  	exit 0  }    unmount() {  	MNTPATH=$1  	if [ $(ismounted) -eq 0 ] ; then  		message='{"status": "Success"}'  		log ${message}  		logtofile "${APPID}:${BUCKET} already unmounted, ${message}"  		exit 0  	fi    	fusermount -u ${MNTPATH} &> /dev/null  	if [ $? -ne 0 ]; then  		message="{ "status": "Failed", "message": "Failed to unmount at ${MNTPATH}"}"  		err ${message}  		logtofile ${message}  		exit 1  	fi    	message='{"status": "Success"}'  	log ${message}  	logtofile "${MNTPATH} unmounted, ${message}"  	exit 0  }      op=$1    if ! command -v jq >/dev/null 2>&1; then  	err "{ "status": "Failure", "message": "'jq' binary not found. Please install jq package before using this driver"}"  	exit 1  fi    if [ "$op" = "init" ]; then  	log '{"status": "Success", "capabilities": {"attach": false}}'  	exit 0  fi    if [ $# -lt 2 ]; then  	usage  fi      shift    case "$op" in  	mount)  		domount $*  		;;  	unmount)  		unmount $*  		;;  	*)  		log '{"status": "Not supported"}'  		exit 0  esac    exit 1

在機器上安裝 腳本以及依賴(cosfs, jq 等), 進行測試.

apiVersion: extensions/v1beta1  kind: DaemonSet  metadata:    name: test-cos    namespace: default  spec:    template:      metadata:        name: test-cos        labels:          app: test-cos      spec:        containers:        - name: test-cos          image: busybox          args:          - /bin/sh          - -c          - ls /data          volumeMounts:          - name: test            mountPath: /data        volumes:        - name: test          flexVolume:            driver: "k8s/cos"            fsType: "cos"            options:              appid: "your appid"              bucket: "your bucket"              remote: "your remote, example: "              dir: "your dir to mount"              secretid: "your secretid"              secretkey: "your secretkey"

這裡的腳本比較簡潔,只實現了 kubelet 需要執行的 mount/unmount 命令,並不支援 pv/pvc 和 dynamic provision, 改造方式參考

實現一個 CSI plugin

由於前面我們實現了了一個簡單的 flexvolume plugin,我們可以使用 csi-flex-adapter 很快的做一個簡單的 csi plugin, 包裝類似的 flexvolume plugin 成為 csi plugin.

# 編譯 flexadapter, 並啟動  app/flexadapter/flexadapter --endpoint tcp://127.0.0.1:10000 --drivername simplenfs --driverpath ./pkg/flexadapter/examples/simplenfs-flexdriver/driver/nfs --nodeid CSINode -v=5    # 下載 csc 工具進行測試  GO111MODULE=off go get -u github.com/rexray/gocsi/csc    csc identity plugin-info --endpoint tcp://127.0.0.1:10000  "simplenfs"	"1.0.0-rc2"    csc node publish --endpoint tcp://127.0.0.1:10000 --target-path /mnt/nfs --pub-context server=10.0.0.4,share=nfs_share nfstestvol

參考