Extend Kubernetes – FlexVolume And CSI
- 2020 年 2 月 4 日
- 筆記
簡介
FlexVolume/CSI 是什麼
kubernetes 的 volume 解決的 存儲 state 的問題。State 有很多種存儲方式,kubernetes 只關注其中的一部分
In socpe (Posix/SCSI) |
Out of scope |
---|---|
文件存儲 (nfs, smb) |
對象存儲 (s3, gcs, cos) |
塊存儲 (cephrbd, aws ebs) |
SQL,NOSQL,TSDB |
塊存儲上的文件 |
Pub-Sub System (kafka, aws sns) |
volume plugins 又可以分成幾類
類別 |
例子 |
---|---|
Remote Storage |
awsElasticBlockStore,azureDisk,azureFile,cephfs,cinder…. |
Ephemeral Storage |
EmptyDir; Secret, ConfigMap, DownwardAPI |
Local |
HostPath, Local PV |
Out-Of-Tree |
Flex Volume, CSI, Other |
- Volume 的操作一般都是 Attach volume to node -> Mount volume to pod
- 這篇文章關注 Out-Of-Tree 的 FlexVolume/CSI
FlexVolume/CSI 位於什麼位置
對於 VolumeManager 來講, FlexVolume/CSI plugin 只是插件的一種,使用起來並無區別.
FlexVolume/CSI 對比
- FlexVolume: 二進位實現,類似 CNI,簡單直接,但是存在一個問題是 可能部署會比較麻煩,部署的二進位可能還有依賴
- CSI: 設計複雜通用,類似 CNI/CRI 目標不僅僅是提供給 kubernetes 使用, 但是實現/部署更為複雜.
- 未來所有的 storage plugin 的實現都是推薦用 Out-of-tree CSI driver 的方式實現, 已有的 flexvolume 模式會繼續維護,無需遷移,但是也提供的遷移方式: flexadapter.
FlexVolume/CSI Plugin
K8s掛載卷的基本過程(涉及的組件):
- 用戶創建Pod包含一個PVC
- Pod被分配到節點NodeA
- Kubelet等待Volume Manager準備設備
- PV controller調用相應Volume Plugin(in-tree或者out-of-tree)創建持久化卷並在系統中創建 PV對象以及其與PVC的綁定(Provision)
- Attach/Detach controller或者Volume Manager通過Volume Plugin實現塊設備掛載(Attach)
- Volume Manager等待設備掛載完成,將卷掛載到節點指定目錄(mount)
- /var/lib/kubelet/plugins/kubernetes.io/aws-ebs/mounts/vol-xxxxxxxxxxxxxxxxx
- Kubelet在被告知設備準備好後啟動Pod中的容器,利用Docker –v等參數將已經掛載到本地 的卷映射到容器中(volume mapping)
FlexVolume
FlexVolume 的執行流程
協議和實現
- 需要提前在 node 上部署插件二進位
- 二進位需要實現一下方法,掛載參數等 會以json 參數的形式傳遞給二進位
- init: Called during Kubelet & Controller manager initialization.
- attach: Controller Manager 調用, Attach the volume specified by the given spec on the given node.
- detach: Controller Manager 調用, Detach the volume from the node
- waitforattach: Controller Manager 調用, Wait for the volume to be attached on the remote node
- isattached: Controller Manager 調用, Check the volume is attached on the node
- (un)mountdevice: Kubelet 調用, Mount device mounts the device to a global path which individual pods can then bind mount.
- (un)mount: Kubelet 調用, Mount the volume at the mount dir.
CSI
CSI 的執行流程
正在上傳圖片…
部署方式
- StatefuelSet:副本數為 1 保證只有一個實例運行,它包含三個容器
- 用戶實現的 CSI Driver 插件
- External Attacher:Kubernetes 提供的 sidecar 容器,它監聽 VolumeAttachment 和 PersistentVolume 對象的變化情況,並調用 CSI 插件的 ControllerPublishVolume 和 ControllerUnpublishVolume 等 API 將 Volume 掛載或卸載到指定的 Node 上
- External Provisioner:Kubernetes 提供的 sidecar 容器,它監聽 PersistentVolumeClaim 對象的變化情況,並調用 CSI 插件的 ControllerPublish 和 ControllerUnpublish 等 API 管理 Volume
- Daemonset:將 CSI 插件運行在每個 Node 上,以便 Kubelet 可以調用。它包含 2 個容器
- 用戶實現的 CSI Driver 插件
- Driver Registrar:註冊 CSI 插件到 kubelet 中,並初始化 NodeId(即給 Node 對象增加一個 Annotation csi.volume.kubernetes.io/nodeid)
CSI 生命周期,這裡是其中一個 Dynamically Provisioned Volume + block device 的例子,更多完整的例子 參考
CreateVolume +------------+ DeleteVolume +------------->| CREATED +--------------+ | +---+----+---+ | | Controller | | Controller v +++ Publish | | Unpublish +++ |X| Volume | | Volume | | +-+ +---v----+---+ +-+ | NODE_READY | +---+----^---+ Node | | Node Stage | | Unstage Volume | | Volume +---v----+---+ | VOL_READY | +------------+ Node | | Node Publish | | Unpublish Volume | | Volume +---v----+---+ | PUBLISHED | +------------+ Figure 6: The lifecycle of a dynamically provisioned volume, from creation to destruction, when the Node Plugin advertises the STAGE_UNSTAGE_VOLUME capability.
協議和實現
Service |
作用 |
方法 |
---|---|---|
Identity |
用於 Kubernetes 與 CSI 插件協調版本資訊 |
GetPluginInfo/GetPluginCapabilities/Probe |
Controller |
用於創建、刪除以及管理 Volume 存儲卷 |
Create/DeleteVolume, ControllerPublish/UnPublishVolume, ListVolumes, ListVolumes, CreateSnapshot, ControllerExpandVolume…. |
Node |
用於將 Volume 存儲卷掛載到指定的目錄中以便 Kubelet 創建容器時使用(需要監聽在 /var/lib/kubelet/plugins/SanitizedCSIDriverName/csi.sock |
NodeStage/UnStageVolume, NodePublish/UnPublishVolume, NodeExpandVolume… |
其他方法可以由 kubernetes team 實現的 sidecar 搭配使用
常見 FlexVolume/CSI plugin 實現
FlexVolume Plugins
CSI Plugins
這裡有一些簡單的例子, ,大部分只實現了 nodeService 部分
這裡我們可以看一下 gcp-compute-persistent-disk-csi-driver 的實現, 這個庫實現得比較完整
- GCEIdentityServer
- GCENodeServer
- Node(Un)PublishVolume -> Mount
- Node(Un)StageVolume -> MountAndFormat
- NodeExpandVolume -> Resizefs
- GCEControllerServer
- Create/DeleteVolume -> Call CloudProvider to create Volume
- Controller(Un)PublishVolume -> Call CloudProvider to attach Volume
- CreateSnapshot -> Call CloudProvider create snapshot for volume
- ControllerExpandVolume -> Call CloudProvider to resize disk
實踐
實現一個 FlexVolume plugin
這個給出了一個例子 利用 cosfs 實現 flex volume 的例子
#!/usr/bin/env bash # Notes: # - Please install "jq" package before using this driver. # - Please install "cosfs > 1.5.0" package before using this driver. # warning: do not edit this line, this may be replace when deploy.sh DEBUG_FLEX_COS="${DEBUG_FLEX_COS:-false}" usage() { err "Invalid usage. Usage: " err "t$0 init" err "t$0 mount <mount dir> <json params>" err "t$0 unmount <mount dir>" exit 1 } logtofile() { echo [`date`] $* >> /var/log/flexcos.log } err() { echo -ne $* 1>&2 } log() { echo -ne $* >&1 } ismounted() { MOUNT=`findmnt -n ${MNTPATH} 2>/dev/null | cut -d' ' -f1` if [ "${MOUNT}" == "${MNTPATH}" ]; then echo "1" else echo "0" fi } domount() { MNTPATH=$1 APPID=$(echo $2|jq -r '.["appid"]') BUCKET=$(echo $2|jq -r '.["bucket"]') REMOTE=$(echo $2|jq -r '.["remote"]') DIR=$(echo $2|jq -r '.["dir"]') SECRETID=$(echo $2|jq -r '.["secretid"]') SECRETKEY=$(echo $2|jq -r '.["secretkey"]') DEBUGLEVEL="${DEBUGLEVEL:-info}" if [[ "$DIR" != "null" ]] && [[ "$DIR" != "" ]];then if [[ "$DIR" != /* ]];then DIR=/${DIR} fi else DIR="/" fi DIR=:${DIR} if [ $(ismounted) -eq 1 ] ; then message='{"status": "Success"}' log $message logtofile "${APPID}:${BUCKET} already mounted, ${message}" exit 0 fi mkdir -p ${MNTPATH} &> /dev/null mkdir -p /data/cache/${MNTPATH}/cos echo "${BUCKET}-${APPID}:${SECRETID}:${SECRETKEY}" > /data/cache/${MNTPATH}/passwd chmod 600 /data/cache/${MNTPATH}/passwd if [ "$DEBUG_FLEX_COS" == "true" ];then logtofile "cosfs ${BUCKET}-${APPID}${DIR} ${MNTPATH} -ourl=${REMOTE} -odbglevel=${DEBUGLEVEL} -oallow_other -ouse_cache=/data/cache/${MNTPATH}/cos -odel_cache -oensure_diskfree=5000 -opasswd_file=/data/cache/${MNTPATH}/passwd" fi out=$(cosfs ${BUCKET}-${APPID}${DIR} ${MNTPATH} -ourl=${REMOTE} -odbglevel=${DEBUGLEVEL} -oallow_other -ouse_cache=/data/cache/${MNTPATH}/cos -odel_cache -oensure_diskfree=5000 -opasswd_file=/data/cache/${MNTPATH}/passwd 2>&1) if [ $? -ne 0 ]; then message="{ "status": "Failure", "message": "Failed to mount ${APPID}:${BUCKET} at ${MNTPATH}"}" err ${message} logtofile ${message}${out} exit 1 fi message='{"status": "Success"}' log ${message} logtofile "${APPID}:${BUCKET} mounted, ${message}" exit 0 } unmount() { MNTPATH=$1 if [ $(ismounted) -eq 0 ] ; then message='{"status": "Success"}' log ${message} logtofile "${APPID}:${BUCKET} already unmounted, ${message}" exit 0 fi fusermount -u ${MNTPATH} &> /dev/null if [ $? -ne 0 ]; then message="{ "status": "Failed", "message": "Failed to unmount at ${MNTPATH}"}" err ${message} logtofile ${message} exit 1 fi message='{"status": "Success"}' log ${message} logtofile "${MNTPATH} unmounted, ${message}" exit 0 } op=$1 if ! command -v jq >/dev/null 2>&1; then err "{ "status": "Failure", "message": "'jq' binary not found. Please install jq package before using this driver"}" exit 1 fi if [ "$op" = "init" ]; then log '{"status": "Success", "capabilities": {"attach": false}}' exit 0 fi if [ $# -lt 2 ]; then usage fi shift case "$op" in mount) domount $* ;; unmount) unmount $* ;; *) log '{"status": "Not supported"}' exit 0 esac exit 1
在機器上安裝 腳本以及依賴(cosfs, jq 等), 進行測試.
apiVersion: extensions/v1beta1 kind: DaemonSet metadata: name: test-cos namespace: default spec: template: metadata: name: test-cos labels: app: test-cos spec: containers: - name: test-cos image: busybox args: - /bin/sh - -c - ls /data volumeMounts: - name: test mountPath: /data volumes: - name: test flexVolume: driver: "k8s/cos" fsType: "cos" options: appid: "your appid" bucket: "your bucket" remote: "your remote, example: " dir: "your dir to mount" secretid: "your secretid" secretkey: "your secretkey"
這裡的腳本比較簡潔,只實現了 kubelet 需要執行的 mount/unmount 命令,並不支援 pv/pvc 和 dynamic provision, 改造方式參考
實現一個 CSI plugin
由於前面我們實現了了一個簡單的 flexvolume plugin,我們可以使用 csi-flex-adapter 很快的做一個簡單的 csi plugin, 包裝類似的 flexvolume plugin 成為 csi plugin.
# 編譯 flexadapter, 並啟動 app/flexadapter/flexadapter --endpoint tcp://127.0.0.1:10000 --drivername simplenfs --driverpath ./pkg/flexadapter/examples/simplenfs-flexdriver/driver/nfs --nodeid CSINode -v=5 # 下載 csc 工具進行測試 GO111MODULE=off go get -u github.com/rexray/gocsi/csc csc identity plugin-info --endpoint tcp://127.0.0.1:10000 "simplenfs" "1.0.0-rc2" csc node publish --endpoint tcp://127.0.0.1:10000 --target-path /mnt/nfs --pub-context server=10.0.0.4,share=nfs_share nfstestvol
參考
- Ali_Kasinadhuni_Managing_Disk_Volumes_in_Kubernetes
- Kubernetes存儲系統介紹及機制實現
- Kubernetes 之存儲學習整理
- kubernetes-storage-dynamic-volumes-and-the-container-storage-interface
- dynamically-expand-volume-with-csi-and-kubernetes
- volume-plugin-faq
- flexvolume.md
- csi-feisky
- Container Storage Interface 標準介紹
- CSI-SPEC