一些kubernetes的開發/實現/使用技巧-2
- 2019 年 10 月 31 日
- 筆記
查看某種資源
kubectl get --raw "/apis/metrics.k8s.io/v1beta1/nodes" 或者 kubectl proxy 在頁面上看
Controller 邏輯(JobController為例)
JobController的實現邏輯比較簡單,用它來示例 Controller的實現方式


serviceaccount_controller 和 tokens_controller
- serviceaccount_controller: 讓每個空間都有一個默認 serviceaccount, 比如配置的 "default"
- tokens_controller: 讓 serviceaccount 有對應 token: secret
kubernetes 可配置特性
默認是否打開,以及當前成熟程度
pkg/features/kube_features.go
kubectl code 位置問題
kubectl中 auth/convert/cp/get 在 k8s.io/kubernetes/pkg 下面,其餘代碼在 k8s.io/kubectl 下面,這個是因為以前都在 k8s.io/kubernetes/pkg, 後來漸漸 move to staging目錄下,還沒移完
kubectl 實現方式
kubectl 的核心在 vendor/k8s.io/cli-runtime,其中比較重要的是 vendor/k8s.io/cli-runtime/pkg/resource/builder.go
構建builder->設置builder參數->Do設置vistors->Info取回並修飾結果
type RESTClientGetter interface { // ToRESTConfig returns restconfig ToRESTConfig() (*rest.Config, error) // ToDiscoveryClient returns discovery client // DiscoveryInterface holds the methods that discover server-supported API groups, // versions and resources. ToDiscoveryClient() (discovery.CachedDiscoveryInterface, error) // ToRESTMapper returns a restmapper // RESTMapper allows clients to map resources to kind, and map kind and version // to interfaces for manipulating those objects. It is primarily intended for // consumers of Kubernetes compatible REST APIs as defined in docs/devel/api-conventions.md. ToRESTMapper() (meta.RESTMapper, error) // ToRawKubeConfigLoader return kubeconfig loader as-is ToRawKubeConfigLoader() clientcmd.ClientConfig } // Result contains helper methods for dealing with the outcome of a Builder. type Result struct { err error visitor Visitor sources []Visitor singleItemImplied bool targetsSingleItems bool mapper *mapper ignoreErrors []utilerrors.Matcher // populated by a call to Infos info []*Info }

kubelet的核心組件


圖片來自 https://feisky.gitbooks.io/kubernetes/components/kubelet.html (上圖中kubelet之後還有ContainerManager(名字容易混淆)會設置cgroup,device resource之類的信息,然後才會調用genericRuntimeManager)
- PodWorkers: podWorkers handle syncing Pods in response to events.
- kubepod.Manager: podManager is a facade that abstracts away the various sources of pods this Kubelet services.
- eviction.Manager: Needed to observe and respond to situations that could impact node stability
- kubecontainer.ContainerCommandRunner: run in container, 即 exec in container
- cadvisor: 監控
- dnsConfigurer: setting up DNS resolver configuration when launching pods
- VolumePluginMgr: Volume plugins.
- probeManager/livenessManager: Handles container probing/ Manages container health check results.
- kubecontainer.ContainerGC: Policy for handling garbage collection of dead containers.
- images.ImageGCManager: Manager for image garbage collection.
- logs.ContainerLogManager: Manager for container logs.
- secret.Manager: Secret manager
- configmap.Manager: ConfigMap manager.
- certificate.Manager: Handles certificate rotations.
- status.Manager: Syncs pods statuses with apiserver; also used as a cache of statuses.
- volumemanager.VolumeManager: attach/mount/unmount/detach volumes for pods
- cloudprovider.Interface
- cloudresource.SyncManager
- kubecontainer.Runtime: Container runtime, GetPods/SyncPod/KillPod/GetPodStatus/ImageService….
- kubecontainer.StreamingRuntime: GetExec/GetAttach/GetPortForward
- RuntimeService:
- ContainerManager(Create/Start/Stop/List/Exec…Container)
- PodSandboxManager(Run/Stop/Remove..PodSandbox)
- ContainerStatsManager
- PodLifecycleEventGenerator: Generates pod events.
- oomwatcher.Watcher
- cm.ContainerManager: Start/SystemCgroupsLimit/GetNodeConfig/GetMountedSubsystems/GetQOSContainersInfo…
- pluginmanager.PluginManager
kubelet的入口線程
kubelet.go
- ListenAndServe/ListenAndServeReadOnly: server 10250/10255
- ListenAndServePodResources: a gRPC server to serve the PodResources service
- For serviceIndexer/nodeIndexer: get local cache for service and node object
- containerGC/imageManager.GarbageCollection: 定期 GarbageCollect, call kubeGenericRuntimeManager.containerGC evictContainers/evictSandboxes/evictPodLogsDirectories / realImageGCManager.GarbageCollect
- pluginManager.Run: CSIPlugin/DevicePlugin
- cloudResourceSyncManager: sync node address
- volumeManager: runs a set of asynchronous loops that figure out which volumes need to be attached/mounted/unmounted/detached based on the pods scheduled on this node and makes it so.
- syncNodeStatus/fastStatusUpdateOnce/nodeLeaseController: updateNodeStatus 兩種上報方式,lease輕量不易因為集群數據量過大失敗
- updateRuntimeUp: every 5s , initializing the runtime dependent modules when the container runtime first comes up
- podKiller: every 1s, Start a goroutine responsible for killing pods (that are not properly handled by pod workers).
syncLoopIteration // Arguments: // 1. configCh: a channel to read config events from, 來自http/status/apiserver // 2. handler: the SyncHandler to dispatch pods to, 同步狀態 // 3. syncCh: a channel to read periodic sync events from // 4. housekeepingCh: a channel to read housekeeping events from // 5. plegCh: a channel to read PLEG updates from, 容器狀態變化ContainerStarted/Died/Removed/..
cgroup 結構
https://zhuanlan.zhihu.com/p/38359775
# ubuntu 16.04; kubernetes v1.10.5 ubuntu@VM-0-12-ubuntu:~$ systemd-cgls Control group /: -.slice ├─init.scope │ └─1 /sbin/init ├─system.slice │ ├─avahi-daemon.service │ │ ├─1268 avahi-daemon: running [VM-0-12-ubuntu.local │ │ └─1283 avahi-daemon: chroot helpe | | -- 略 │ ├─dockerd.service │ │ ├─ 5134 /usr/bin/dockerd --config-file=/etc/docker/daemon.json │ │ ├─ 5143 docker-containerd --config /var/run/docker/containerd/containerd.toml │ │ └─29537 docker-containerd-shim -namespace moby -workdir /data/docker/containerd/daemon/io.containerd.runtime.v1.linux/moby/303a0718c84995350d835f6e2d17036 | | | 略 │ ├─accounts-daemon.service │ │ └─1262 /usr/lib/accountsservice/accounts-daemon | | --略 │ ├─NetworkManager.service │ │ └─1287 /usr/sbin/NetworkManager --no-daemon │ ├─kubelet.service │ │ └─5239 /usr/bin/kubelet --cluster-dns=10.15.255.254 --network-plugin=cni --kube-reserved=cpu=80m,memory=1319Mi --cloud-config=/etc/kubernetes/qcloud.conf │ ├─rsyslog.service │ │ └─1251 /usr/sbin/rsyslogd -n | | 略 │ └─acpid.service │ └─1293 /usr/sbin/acpid ├─user.slice │ └─user-500.slice │ ├─session-129315.scope │ │ ├─27862 sshd: ubuntu [priv] │ └─[email protected] │ └─init.scope │ ├─27870 /lib/systemd/systemd --user │ └─27871 (sd-pam) └─kubepods ├─burstable │ ├─pod5645ed58-e98f-11e9-8443-52540087514c │ │ ├─1f8f76dacb8334bd8d8ab2a7432d2cc250286ca6b5b73ab6dca9a845b77a3a09 │ │ │ └─8958 /configmap-reload --webhook-url=http://localhost:9090/-/reload --volume-dir=/etc/prometheus/rules/prometheus-k8s-rulefiles-0 └─besteffort ├─pod3cf3ae0d-b7f4-11e9-8443-52540087514c │ ├─fde2178c5fa634206c2c86756c107c3de2828d2f90e2ea4c6a3b57f50c25267c │ │ └─5435 /pause │ └─5b4082efeb73ad102cc3fea33ff4c931c042a7120f0cd5277d46660aedffffde │ ├─ 5663 sh /install-cni.sh │ └─20347 sleep 3600

APIserver 結構
一個不錯的參考:https://note.youdao.com/ynoteshare1/index.html?id=63f58c5e98634c8b3df9da2b024aacd5&type=note

重要流程
- CreateKubeAPIServer
- completedConfig.InstallLegacyAPI: api/all 和 api/legacy,分別控制全部和遺留 API
- completedConfig.InstallAPIs
- apiGroupInfo=restStorageBuilder.NewRESTStorage: 比較重要的元素是 VersionedResourcesStorageMap mapstringmapstringrest.Storage: {"v1beta1":{"deployments":deploymentStorage.Deployment}}
- 以"app"為例: if v1enable: storageMap=RESTStorageProvider(storage_app).v1Storage
- deploymentStorage = deploymentstore.NewStorage, storage"deployments" = deploymentStorage.Deployment; deploymentStorage 裏面是 XXXREST元素, XXXREST元素的解釋見下面
- 以"app"為例: if v1enable: storageMap=RESTStorageProvider(storage_app).v1Storage
- GenericAPIServer.InstallAPIGroups
- s.installAPIResources: 核心安裝 API 的方法,建立 api 和 storage的關係
- apiGroupVersion.InstallREST
- installer.Install()
- registerResourceHandlers: 對storage裏面所有的path 關聯 storage
- 比如 actions = appendIf(actions, action{"GET", itemPath, nameParams, namer, false}, isGetter)
- handler = restfulGetResource(getter, exporter, reqScope)
- route := ws.GET(action.Path).To(handler).Doc(doc)….
- installer.Install()
- apiGroupVersion.InstallREST
- s.DiscoveryGroupManager.AddGroup
- s.Handler.GoRestfulContainer.Add(discovery.NewAPIGroupHandler(s.Serializer, apiGroup).WebService())
- s.installAPIResources: 核心安裝 API 的方法,建立 api 和 storage的關係
- apiGroupInfo=restStorageBuilder.NewRESTStorage: 比較重要的元素是 VersionedResourcesStorageMap mapstringmapstringrest.Storage: {"v1beta1":{"deployments":deploymentStorage.Deployment}}
// NewREST returns a RESTStorage object that will work against deployments. func NewREST(optsGetter generic.RESTOptionsGetter) (*REST, *StatusREST, *RollbackREST, error) { store := &genericregistry.Store{ NewFunc: func() runtime.Object { return &apps.Deployment{} }, NewListFunc: func() runtime.Object { return &apps.DeploymentList{} }, DefaultQualifiedResource: apps.Resource("deployments"), CreateStrategy: deployment.Strategy, UpdateStrategy: deployment.Strategy, DeleteStrategy: deployment.Strategy, TableConvertor: printerstorage.TableConvertor{TableGenerator: printers.NewTableGenerator().With(printersinternal.AddHandlers)}, } options := &generic.StoreOptions{RESTOptions: optsGetter} if err := store.CompleteWithOptions(options); err != nil { return nil, nil, nil, err } statusStore := *store statusStore.UpdateStrategy = deployment.StatusStrategy return &REST{store, []string{"all"}}, &StatusREST{store: &statusStore}, &RollbackREST{store: store}, nil } type REST struct { *genericregistry.Store categories []string } genericregistry.Store 定義了 NewList,NewObject,CreateStrategy,UpdateStrategy 核心是 DryRunnableStorage:DryRunnableStorage中的 storage.Interface 是實際對存儲的 crud 入口 type DryRunnableStorage struct { Storage storage.Interface Codec runtime.Codec } Storage 是 Cacher struct {真實stotrage -> etcd3/store} generic.StoreOptions.RESTOptions決定了後端存儲 是 completedConfig.(genericapiserver.CompletedConfig).的一部分 從最上面一層一層傳遞下來 來自 buildGenericConfig <- createAggregatorConfig master.config->completedConfig 最終可以發現 generic.RESTOptions.Decorator = genericregistry.StorageWithCacher(cacheSize) 即帶 cache 的etcd後端 (EnableWatchCache打開的時候 default true) cache 的實現在 vendor/k8s.io/apiserver/pkg/storage/cacher/cacher.go 下面具體的看這個 cache的實現
apiserver 裏面的 cache 實現
watch 為例; user 為 vendor/k8s.io/apiserver/pkg/registry/generic/registry/store.go

動作 |
處理 |
---|---|
Create |
etcd3/store:Create |
Delete |
etcd3/store:Delete |
Watch |
etcd3/註冊 watcher 接受事件, 從 cache |
Get |
resourceVersion=""時直接去 store get; 否則從 cache 獲取(需要wait resourceVersion) |
List |
和 Get 類似 |
Debug Etcd
# 下載 etcd ETCD_VER=v3.4.0 DOWNLOAD_URL=https://github.com/etcd-io/etcd/releases/download curl -L ${DOWNLOAD_URL}/${ETCD_VER}/etcd-${ETCD_VER}-linux-amd64.tar.gz -o /tmp/etcd-${ETCD_VER}-linux-amd64.tar.gz && tar xzvf /tmp/etcd-${ETCD_VER}-linux-amd64.tar.gz -C /tmp/etcd-download-test --strip-components=1 && rm -f /tmp/etcd-${ETCD_VER}-linux-amd64.tar.gz # 配置環境 export ETCDCTL_CERT=/etc/kubernetes/certs/kube-apiserver-etcd-client.crt export ETCDCTL_KEY=/etc/kubernetes/certs/kube-apiserver-etcd-client.key export ETCDCTL_CACERT=/etc/kubernetes/certs/kube-apiserver-etcd-ca.crt export ETCDCTL_ENDPOINTS=https://etcd.cls-4lr4c4wx.ccs.tencent-cloud.com:2379 etcdctl get "" --prefix=true --limit=1 # get key and value etcdctl get "" --prefix=true --keys-only --limit=100 # get only keys etcdctl get "/cls-4lr4c4wx/pods" --prefix=true --keys-only --limit=10 # get pod keys; 這裡cls-4lr4c4wx是etcd prefix etcdctl get "/cls-4lr4c3wx/configmaps" --prefix=true --limit 1 --write-out="json" # 輸出為 json
1.16 裏面的 watch bookmark event是什麼意思
比如一個客戶端 watch pod
GET /api/v1/namespaces/test/pods?watch=1&resourceVersion=10245&allowWatchBookmarks=true --- 200 OK Transfer-Encoding: chunked Content-Type: application/json { "type": "ADDED", "object": {"kind": "Pod", "apiVersion": "v1", "metadata": {"resourceVersion": "10596", ...}, ...} } { "type": "BOOKMARK", "object": {"kind": "Pod", "apiVersion": "v1", "metadata": {"resourceVersion": "12746"} } }
然後 watcher 發生了重啟, 有BOOKMARK的 watcher就可以從 resourceVersion=12746 開始繼續 watch, 而沒有收到 BOOKMARK 的,只能從 resourceVersion=10596 繼續 watch,但是其實 10596-12746 起碼沒有他關心的 event了.
apiserver 是如何實現 Aggregator 的

aggregator 本身也是一個 controller
