Extend Kubernetes – FlexVolume And CSI
- 2020 年 2 月 4 日
- 筆記
简介
FlexVolume/CSI 是什么
kubernetes 的 volume 解决的 存储 state 的问题。State 有很多种存储方式,kubernetes 只关注其中的一部分
In socpe (Posix/SCSI) |
Out of scope |
---|---|
文件存储 (nfs, smb) |
对象存储 (s3, gcs, cos) |
块存储 (cephrbd, aws ebs) |
SQL,NOSQL,TSDB |
块存储上的文件 |
Pub-Sub System (kafka, aws sns) |
volume plugins 又可以分成几类
类别 |
例子 |
---|---|
Remote Storage |
awsElasticBlockStore,azureDisk,azureFile,cephfs,cinder…. |
Ephemeral Storage |
EmptyDir; Secret, ConfigMap, DownwardAPI |
Local |
HostPath, Local PV |
Out-Of-Tree |
Flex Volume, CSI, Other |
- Volume 的操作一般都是 Attach volume to node -> Mount volume to pod
- 这篇文章关注 Out-Of-Tree 的 FlexVolume/CSI
FlexVolume/CSI 位于什么位置
对于 VolumeManager 来讲, FlexVolume/CSI plugin 只是插件的一种,使用起来并无区别.
FlexVolume/CSI 对比
- FlexVolume: 二进制实现,类似 CNI,简单直接,但是存在一个问题是 可能部署会比较麻烦,部署的二进制可能还有依赖
- CSI: 设计复杂通用,类似 CNI/CRI 目标不仅仅是提供给 kubernetes 使用, 但是实现/部署更为复杂.
- 未来所有的 storage plugin 的实现都是推荐用 Out-of-tree CSI driver 的方式实现, 已有的 flexvolume 模式会继续维护,无需迁移,但是也提供的迁移方式: flexadapter.
FlexVolume/CSI Plugin
K8s挂载卷的基本过程(涉及的组件):
- 用户创建Pod包含一个PVC
- Pod被分配到节点NodeA
- Kubelet等待Volume Manager准备设备
- PV controller调用相应Volume Plugin(in-tree或者out-of-tree)创建持久化卷并在系统中创建 PV对象以及其与PVC的绑定(Provision)
- Attach/Detach controller或者Volume Manager通过Volume Plugin实现块设备挂载(Attach)
- Volume Manager等待设备挂载完成,将卷挂载到节点指定目录(mount)
- /var/lib/kubelet/plugins/kubernetes.io/aws-ebs/mounts/vol-xxxxxxxxxxxxxxxxx
- Kubelet在被告知设备准备好后启动Pod中的容器,利用Docker –v等参数将已经挂载到本地 的卷映射到容器中(volume mapping)
FlexVolume
FlexVolume 的执行流程
协议和实现
- 需要提前在 node 上部署插件二进制
- 二进制需要实现一下方法,挂载参数等 会以json 参数的形式传递给二进制
- init: Called during Kubelet & Controller manager initialization.
- attach: Controller Manager 调用, Attach the volume specified by the given spec on the given node.
- detach: Controller Manager 调用, Detach the volume from the node
- waitforattach: Controller Manager 调用, Wait for the volume to be attached on the remote node
- isattached: Controller Manager 调用, Check the volume is attached on the node
- (un)mountdevice: Kubelet 调用, Mount device mounts the device to a global path which individual pods can then bind mount.
- (un)mount: Kubelet 调用, Mount the volume at the mount dir.
CSI
CSI 的执行流程
正在上传图片…
部署方式
- StatefuelSet:副本数为 1 保证只有一个实例运行,它包含三个容器
- 用户实现的 CSI Driver 插件
- External Attacher:Kubernetes 提供的 sidecar 容器,它监听 VolumeAttachment 和 PersistentVolume 对象的变化情况,并调用 CSI 插件的 ControllerPublishVolume 和 ControllerUnpublishVolume 等 API 将 Volume 挂载或卸载到指定的 Node 上
- External Provisioner:Kubernetes 提供的 sidecar 容器,它监听 PersistentVolumeClaim 对象的变化情况,并调用 CSI 插件的 ControllerPublish 和 ControllerUnpublish 等 API 管理 Volume
- Daemonset:将 CSI 插件运行在每个 Node 上,以便 Kubelet 可以调用。它包含 2 个容器
- 用户实现的 CSI Driver 插件
- Driver Registrar:注册 CSI 插件到 kubelet 中,并初始化 NodeId(即给 Node 对象增加一个 Annotation csi.volume.kubernetes.io/nodeid)
CSI 生命周期,这里是其中一个 Dynamically Provisioned Volume + block device 的例子,更多完整的例子 参考
CreateVolume +------------+ DeleteVolume +------------->| CREATED +--------------+ | +---+----+---+ | | Controller | | Controller v +++ Publish | | Unpublish +++ |X| Volume | | Volume | | +-+ +---v----+---+ +-+ | NODE_READY | +---+----^---+ Node | | Node Stage | | Unstage Volume | | Volume +---v----+---+ | VOL_READY | +------------+ Node | | Node Publish | | Unpublish Volume | | Volume +---v----+---+ | PUBLISHED | +------------+ Figure 6: The lifecycle of a dynamically provisioned volume, from creation to destruction, when the Node Plugin advertises the STAGE_UNSTAGE_VOLUME capability.
协议和实现
Service |
作用 |
方法 |
---|---|---|
Identity |
用于 Kubernetes 与 CSI 插件协调版本信息 |
GetPluginInfo/GetPluginCapabilities/Probe |
Controller |
用于创建、删除以及管理 Volume 存储卷 |
Create/DeleteVolume, ControllerPublish/UnPublishVolume, ListVolumes, ListVolumes, CreateSnapshot, ControllerExpandVolume…. |
Node |
用于将 Volume 存储卷挂载到指定的目录中以便 Kubelet 创建容器时使用(需要监听在 /var/lib/kubelet/plugins/SanitizedCSIDriverName/csi.sock |
NodeStage/UnStageVolume, NodePublish/UnPublishVolume, NodeExpandVolume… |
其他方法可以由 kubernetes team 实现的 sidecar 搭配使用
常见 FlexVolume/CSI plugin 实现
FlexVolume Plugins
CSI Plugins
这里有一些简单的例子, ,大部分只实现了 nodeService 部分
这里我们可以看一下 gcp-compute-persistent-disk-csi-driver 的实现, 这个库实现得比较完整
- GCEIdentityServer
- GCENodeServer
- Node(Un)PublishVolume -> Mount
- Node(Un)StageVolume -> MountAndFormat
- NodeExpandVolume -> Resizefs
- GCEControllerServer
- Create/DeleteVolume -> Call CloudProvider to create Volume
- Controller(Un)PublishVolume -> Call CloudProvider to attach Volume
- CreateSnapshot -> Call CloudProvider create snapshot for volume
- ControllerExpandVolume -> Call CloudProvider to resize disk
实践
实现一个 FlexVolume plugin
这个给出了一个例子 利用 cosfs 实现 flex volume 的例子
#!/usr/bin/env bash # Notes: # - Please install "jq" package before using this driver. # - Please install "cosfs > 1.5.0" package before using this driver. # warning: do not edit this line, this may be replace when deploy.sh DEBUG_FLEX_COS="${DEBUG_FLEX_COS:-false}" usage() { err "Invalid usage. Usage: " err "t$0 init" err "t$0 mount <mount dir> <json params>" err "t$0 unmount <mount dir>" exit 1 } logtofile() { echo [`date`] $* >> /var/log/flexcos.log } err() { echo -ne $* 1>&2 } log() { echo -ne $* >&1 } ismounted() { MOUNT=`findmnt -n ${MNTPATH} 2>/dev/null | cut -d' ' -f1` if [ "${MOUNT}" == "${MNTPATH}" ]; then echo "1" else echo "0" fi } domount() { MNTPATH=$1 APPID=$(echo $2|jq -r '.["appid"]') BUCKET=$(echo $2|jq -r '.["bucket"]') REMOTE=$(echo $2|jq -r '.["remote"]') DIR=$(echo $2|jq -r '.["dir"]') SECRETID=$(echo $2|jq -r '.["secretid"]') SECRETKEY=$(echo $2|jq -r '.["secretkey"]') DEBUGLEVEL="${DEBUGLEVEL:-info}" if [[ "$DIR" != "null" ]] && [[ "$DIR" != "" ]];then if [[ "$DIR" != /* ]];then DIR=/${DIR} fi else DIR="/" fi DIR=:${DIR} if [ $(ismounted) -eq 1 ] ; then message='{"status": "Success"}' log $message logtofile "${APPID}:${BUCKET} already mounted, ${message}" exit 0 fi mkdir -p ${MNTPATH} &> /dev/null mkdir -p /data/cache/${MNTPATH}/cos echo "${BUCKET}-${APPID}:${SECRETID}:${SECRETKEY}" > /data/cache/${MNTPATH}/passwd chmod 600 /data/cache/${MNTPATH}/passwd if [ "$DEBUG_FLEX_COS" == "true" ];then logtofile "cosfs ${BUCKET}-${APPID}${DIR} ${MNTPATH} -ourl=${REMOTE} -odbglevel=${DEBUGLEVEL} -oallow_other -ouse_cache=/data/cache/${MNTPATH}/cos -odel_cache -oensure_diskfree=5000 -opasswd_file=/data/cache/${MNTPATH}/passwd" fi out=$(cosfs ${BUCKET}-${APPID}${DIR} ${MNTPATH} -ourl=${REMOTE} -odbglevel=${DEBUGLEVEL} -oallow_other -ouse_cache=/data/cache/${MNTPATH}/cos -odel_cache -oensure_diskfree=5000 -opasswd_file=/data/cache/${MNTPATH}/passwd 2>&1) if [ $? -ne 0 ]; then message="{ "status": "Failure", "message": "Failed to mount ${APPID}:${BUCKET} at ${MNTPATH}"}" err ${message} logtofile ${message}${out} exit 1 fi message='{"status": "Success"}' log ${message} logtofile "${APPID}:${BUCKET} mounted, ${message}" exit 0 } unmount() { MNTPATH=$1 if [ $(ismounted) -eq 0 ] ; then message='{"status": "Success"}' log ${message} logtofile "${APPID}:${BUCKET} already unmounted, ${message}" exit 0 fi fusermount -u ${MNTPATH} &> /dev/null if [ $? -ne 0 ]; then message="{ "status": "Failed", "message": "Failed to unmount at ${MNTPATH}"}" err ${message} logtofile ${message} exit 1 fi message='{"status": "Success"}' log ${message} logtofile "${MNTPATH} unmounted, ${message}" exit 0 } op=$1 if ! command -v jq >/dev/null 2>&1; then err "{ "status": "Failure", "message": "'jq' binary not found. Please install jq package before using this driver"}" exit 1 fi if [ "$op" = "init" ]; then log '{"status": "Success", "capabilities": {"attach": false}}' exit 0 fi if [ $# -lt 2 ]; then usage fi shift case "$op" in mount) domount $* ;; unmount) unmount $* ;; *) log '{"status": "Not supported"}' exit 0 esac exit 1
在机器上安装 脚本以及依赖(cosfs, jq 等), 进行测试.
apiVersion: extensions/v1beta1 kind: DaemonSet metadata: name: test-cos namespace: default spec: template: metadata: name: test-cos labels: app: test-cos spec: containers: - name: test-cos image: busybox args: - /bin/sh - -c - ls /data volumeMounts: - name: test mountPath: /data volumes: - name: test flexVolume: driver: "k8s/cos" fsType: "cos" options: appid: "your appid" bucket: "your bucket" remote: "your remote, example: " dir: "your dir to mount" secretid: "your secretid" secretkey: "your secretkey"
这里的脚本比较简洁,只实现了 kubelet 需要执行的 mount/unmount 命令,并不支持 pv/pvc 和 dynamic provision, 改造方式参考
实现一个 CSI plugin
由于前面我们实现了了一个简单的 flexvolume plugin,我们可以使用 csi-flex-adapter 很快的做一个简单的 csi plugin, 包装类似的 flexvolume plugin 成为 csi plugin.
# 编译 flexadapter, 并启动 app/flexadapter/flexadapter --endpoint tcp://127.0.0.1:10000 --drivername simplenfs --driverpath ./pkg/flexadapter/examples/simplenfs-flexdriver/driver/nfs --nodeid CSINode -v=5 # 下载 csc 工具进行测试 GO111MODULE=off go get -u github.com/rexray/gocsi/csc csc identity plugin-info --endpoint tcp://127.0.0.1:10000 "simplenfs" "1.0.0-rc2" csc node publish --endpoint tcp://127.0.0.1:10000 --target-path /mnt/nfs --pub-context server=10.0.0.4,share=nfs_share nfstestvol
参考
- Ali_Kasinadhuni_Managing_Disk_Volumes_in_Kubernetes
- Kubernetes存储系统介绍及机制实现
- Kubernetes 之存储学习整理
- kubernetes-storage-dynamic-volumes-and-the-container-storage-interface
- dynamically-expand-volume-with-csi-and-kubernetes
- volume-plugin-faq
- flexvolume.md
- csi-feisky
- Container Storage Interface 標準介紹
- CSI-SPEC