Extend Kubernetes – FlexVolume And CSI

简介

FlexVolume/CSI 是什么

kubernetes 的 volume 解决的 存储 state 的问题。State 有很多种存储方式,kubernetes 只关注其中的一部分

In socpe (Posix/SCSI)

Out of scope

文件存储 (nfs, smb)

对象存储 (s3, gcs, cos)

块存储 (cephrbd, aws ebs)

SQL,NOSQL,TSDB

块存储上的文件

Pub-Sub System (kafka, aws sns)

volume plugins 又可以分成几类

类别

例子

Remote Storage

awsElasticBlockStore,azureDisk,azureFile,cephfs,cinder….

Ephemeral Storage

EmptyDir; Secret, ConfigMap, DownwardAPI

Local

HostPath, Local PV

Out-Of-Tree

Flex Volume, CSI, Other

  • Volume 的操作一般都是 Attach volume to node -> Mount volume to pod
  • 这篇文章关注 Out-Of-Tree 的 FlexVolume/CSI

FlexVolume/CSI 位于什么位置

image
image.png

对于 VolumeManager 来讲, FlexVolume/CSI plugin 只是插件的一种,使用起来并无区别.

FlexVolume/CSI 对比

  • FlexVolume: 二进制实现,类似 CNI,简单直接,但是存在一个问题是 可能部署会比较麻烦,部署的二进制可能还有依赖
  • CSI: 设计复杂通用,类似 CNI/CRI 目标不仅仅是提供给 kubernetes 使用, 但是实现/部署更为复杂.
  • 未来所有的 storage plugin 的实现都是推荐用 Out-of-tree CSI driver 的方式实现, 已有的 flexvolume 模式会继续维护,无需迁移,但是也提供的迁移方式: flexadapter.

FlexVolume/CSI Plugin

K8s挂载卷的基本过程(涉及的组件):

image
  1. 用户创建Pod包含一个PVC
  2. Pod被分配到节点NodeA
  3. Kubelet等待Volume Manager准备设备
  4. PV controller调用相应Volume Plugin(in-tree或者out-of-tree)创建持久化卷并在系统中创建 PV对象以及其与PVC的绑定(Provision)
  5. Attach/Detach controller或者Volume Manager通过Volume Plugin实现块设备挂载(Attach)
  6. Volume Manager等待设备挂载完成,将卷挂载到节点指定目录(mount)
  7. /var/lib/kubelet/plugins/kubernetes.io/aws-ebs/mounts/vol-xxxxxxxxxxxxxxxxx
  8. Kubelet在被告知设备准备好后启动Pod中的容器,利用Docker –v等参数将已经挂载到本地 的卷映射到容器中(volume mapping)

FlexVolume

FlexVolume 的执行流程

image

协议和实现

  • 需要提前在 node 上部署插件二进制
  • 二进制需要实现一下方法,挂载参数等 会以json 参数的形式传递给二进制
    • init: Called during Kubelet & Controller manager initialization.
    • attach: Controller Manager 调用, Attach the volume specified by the given spec on the given node.
    • detach: Controller Manager 调用, Detach the volume from the node
    • waitforattach: Controller Manager 调用, Wait for the volume to be attached on the remote node
    • isattached: Controller Manager 调用, Check the volume is attached on the node
    • (un)mountdevice: Kubelet 调用, Mount device mounts the device to a global path which individual pods can then bind mount.
    • (un)mount: Kubelet 调用, Mount the volume at the mount dir.

CSI

CSI 的执行流程

正在上传图片…

image

部署方式

  • StatefuelSet:副本数为 1 保证只有一个实例运行,它包含三个容器
    • 用户实现的 CSI Driver 插件
    • External Attacher:Kubernetes 提供的 sidecar 容器,它监听 VolumeAttachment 和 PersistentVolume 对象的变化情况,并调用 CSI 插件的 ControllerPublishVolume 和 ControllerUnpublishVolume 等 API 将 Volume 挂载或卸载到指定的 Node 上
    • External Provisioner:Kubernetes 提供的 sidecar 容器,它监听 PersistentVolumeClaim 对象的变化情况,并调用 CSI 插件的 ControllerPublish 和 ControllerUnpublish 等 API 管理 Volume
  • Daemonset:将 CSI 插件运行在每个 Node 上,以便 Kubelet 可以调用。它包含 2 个容器
    • 用户实现的 CSI Driver 插件
    • Driver Registrar:注册 CSI 插件到 kubelet 中,并初始化 NodeId(即给 Node 对象增加一个 Annotation csi.volume.kubernetes.io/nodeid)
image.png

CSI 生命周期,这里是其中一个 Dynamically Provisioned Volume + block device 的例子,更多完整的例子 参考

   CreateVolume +------------+ DeleteVolume   +------------->|  CREATED   +--------------+   |              +---+----+---+              |   |       Controller |    | Controller       v  +++         Publish |    | Unpublish       +++  |X|          Volume |    | Volume          | |  +-+             +---v----+---+             +-+                  | NODE_READY |                  +---+----^---+                 Node |    | Node                Stage |    | Unstage               Volume |    | Volume                  +---v----+---+                  |  VOL_READY |                  +------------+                 Node |    | Node              Publish |    | Unpublish               Volume |    | Volume                  +---v----+---+                  | PUBLISHED  |                  +------------+    Figure 6: The lifecycle of a dynamically provisioned volume, from  creation to destruction, when the Node Plugin advertises the  STAGE_UNSTAGE_VOLUME capability.

协议和实现

SPEC

proto

Service

作用

方法

Identity

用于 Kubernetes 与 CSI 插件协调版本信息

GetPluginInfo/GetPluginCapabilities/Probe

Controller

用于创建、删除以及管理 Volume 存储卷

Create/DeleteVolume, ControllerPublish/UnPublishVolume, ListVolumes, ListVolumes, CreateSnapshot, ControllerExpandVolume….

Node

用于将 Volume 存储卷挂载到指定的目录中以便 Kubelet 创建容器时使用(需要监听在 /var/lib/kubelet/plugins/SanitizedCSIDriverName/csi.sock

NodeStage/UnStageVolume, NodePublish/UnPublishVolume, NodeExpandVolume…

其他方法可以由 kubernetes team 实现的 sidecar 搭配使用

常见 FlexVolume/CSI plugin 实现

FlexVolume Plugins

参考

CSI Plugins

这里有一些简单的例子, ,大部分只实现了 nodeService 部分

这里我们可以看一下 gcp-compute-persistent-disk-csi-driver 的实现, 这个库实现得比较完整

  • GCEIdentityServer
  • GCENodeServer
    • Node(Un)PublishVolume -> Mount
    • Node(Un)StageVolume -> MountAndFormat
    • NodeExpandVolume -> Resizefs
  • GCEControllerServer
    • Create/DeleteVolume -> Call CloudProvider to create Volume
    • Controller(Un)PublishVolume -> Call CloudProvider to attach Volume
    • CreateSnapshot -> Call CloudProvider create snapshot for volume
    • ControllerExpandVolume -> Call CloudProvider to resize disk

实践

实现一个 FlexVolume plugin

这个给出了一个例子 利用 cosfs 实现 flex volume 的例子

#!/usr/bin/env bash    # Notes:  #  - Please install "jq" package before using this driver.  #  - Please install "cosfs > 1.5.0" package before using this driver.      # warning: do not edit this line, this may be replace when deploy.sh  DEBUG_FLEX_COS="${DEBUG_FLEX_COS:-false}"    usage() {  	err "Invalid usage. Usage: "  	err "t$0 init"  	err "t$0 mount <mount dir> <json params>"  	err "t$0 unmount <mount dir>"  	exit 1  }    logtofile() {  	echo [`date`] $* >> /var/log/flexcos.log  }    err() {  	echo -ne $* 1>&2  }    log() {  	echo -ne $* >&1  }    ismounted() {  	MOUNT=`findmnt -n ${MNTPATH} 2>/dev/null | cut -d' ' -f1`  	if [ "${MOUNT}" == "${MNTPATH}" ]; then  		echo "1"  	else  		echo "0"  	fi  }    domount() {  	MNTPATH=$1  	APPID=$(echo $2|jq -r '.["appid"]')  	BUCKET=$(echo $2|jq -r '.["bucket"]')  	REMOTE=$(echo $2|jq -r '.["remote"]')  	DIR=$(echo $2|jq -r '.["dir"]')  	SECRETID=$(echo $2|jq -r '.["secretid"]')  	SECRETKEY=$(echo $2|jq -r '.["secretkey"]')  	DEBUGLEVEL="${DEBUGLEVEL:-info}"      	if [[ "$DIR" != "null" ]] && [[ "$DIR" != "" ]];then  		if [[ "$DIR" != /* ]];then  			DIR=/${DIR}  		fi  	else  		DIR="/"  	fi    	DIR=:${DIR}        if [ $(ismounted) -eq 1 ] ; then  		message='{"status": "Success"}'  		log $message  		logtofile "${APPID}:${BUCKET} already mounted, ${message}"  		exit 0  	fi    	mkdir -p ${MNTPATH} &> /dev/null  	mkdir -p /data/cache/${MNTPATH}/cos  	echo "${BUCKET}-${APPID}:${SECRETID}:${SECRETKEY}" > /data/cache/${MNTPATH}/passwd      chmod 600 /data/cache/${MNTPATH}/passwd    	if [ "$DEBUG_FLEX_COS" == "true" ];then  	    logtofile "cosfs ${BUCKET}-${APPID}${DIR} ${MNTPATH} -ourl=${REMOTE} -odbglevel=${DEBUGLEVEL} -oallow_other -ouse_cache=/data/cache/${MNTPATH}/cos -odel_cache -oensure_diskfree=5000 -opasswd_file=/data/cache/${MNTPATH}/passwd"  	fi  	out=$(cosfs ${BUCKET}-${APPID}${DIR} ${MNTPATH}    			-ourl=${REMOTE}     			-odbglevel=${DEBUGLEVEL}    			-oallow_other    			-ouse_cache=/data/cache/${MNTPATH}/cos   			-odel_cache   			-oensure_diskfree=5000   			-opasswd_file=/data/cache/${MNTPATH}/passwd 2>&1)    	if [ $? -ne 0 ]; then  		message="{ "status": "Failure", "message": "Failed to mount ${APPID}:${BUCKET} at ${MNTPATH}"}"  		err ${message}  		logtofile ${message}${out}  		exit 1  	fi    	message='{"status": "Success"}'  	log ${message}  	logtofile "${APPID}:${BUCKET} mounted, ${message}"  	exit 0  }    unmount() {  	MNTPATH=$1  	if [ $(ismounted) -eq 0 ] ; then  		message='{"status": "Success"}'  		log ${message}  		logtofile "${APPID}:${BUCKET} already unmounted, ${message}"  		exit 0  	fi    	fusermount -u ${MNTPATH} &> /dev/null  	if [ $? -ne 0 ]; then  		message="{ "status": "Failed", "message": "Failed to unmount at ${MNTPATH}"}"  		err ${message}  		logtofile ${message}  		exit 1  	fi    	message='{"status": "Success"}'  	log ${message}  	logtofile "${MNTPATH} unmounted, ${message}"  	exit 0  }      op=$1    if ! command -v jq >/dev/null 2>&1; then  	err "{ "status": "Failure", "message": "'jq' binary not found. Please install jq package before using this driver"}"  	exit 1  fi    if [ "$op" = "init" ]; then  	log '{"status": "Success", "capabilities": {"attach": false}}'  	exit 0  fi    if [ $# -lt 2 ]; then  	usage  fi      shift    case "$op" in  	mount)  		domount $*  		;;  	unmount)  		unmount $*  		;;  	*)  		log '{"status": "Not supported"}'  		exit 0  esac    exit 1

在机器上安装 脚本以及依赖(cosfs, jq 等), 进行测试.

apiVersion: extensions/v1beta1  kind: DaemonSet  metadata:    name: test-cos    namespace: default  spec:    template:      metadata:        name: test-cos        labels:          app: test-cos      spec:        containers:        - name: test-cos          image: busybox          args:          - /bin/sh          - -c          - ls /data          volumeMounts:          - name: test            mountPath: /data        volumes:        - name: test          flexVolume:            driver: "k8s/cos"            fsType: "cos"            options:              appid: "your appid"              bucket: "your bucket"              remote: "your remote, example: "              dir: "your dir to mount"              secretid: "your secretid"              secretkey: "your secretkey"

这里的脚本比较简洁,只实现了 kubelet 需要执行的 mount/unmount 命令,并不支持 pv/pvc 和 dynamic provision, 改造方式参考

实现一个 CSI plugin

由于前面我们实现了了一个简单的 flexvolume plugin,我们可以使用 csi-flex-adapter 很快的做一个简单的 csi plugin, 包装类似的 flexvolume plugin 成为 csi plugin.

# 编译 flexadapter, 并启动  app/flexadapter/flexadapter --endpoint tcp://127.0.0.1:10000 --drivername simplenfs --driverpath ./pkg/flexadapter/examples/simplenfs-flexdriver/driver/nfs --nodeid CSINode -v=5    # 下载 csc 工具进行测试  GO111MODULE=off go get -u github.com/rexray/gocsi/csc    csc identity plugin-info --endpoint tcp://127.0.0.1:10000  "simplenfs"	"1.0.0-rc2"    csc node publish --endpoint tcp://127.0.0.1:10000 --target-path /mnt/nfs --pub-context server=10.0.0.4,share=nfs_share nfstestvol

参考