一些kubernetes的開發/實現/使用技巧-2

  • 2019 年 10 月 31 日
  • 筆記

查看某種資源

kubectl get --raw "/apis/metrics.k8s.io/v1beta1/nodes"  或者  kubectl proxy 在頁面上看

Controller 邏輯(JobController為例)

JobController的實現邏輯比較簡單,用它來示例 Controller的實現方式

serviceaccount_controller 和 tokens_controller

  • serviceaccount_controller: 讓每個空間都有一個默認 serviceaccount, 比如配置的 "default"
  • tokens_controller: 讓 serviceaccount 有對應 token: secret

kubernetes 可配置特性

默認是否打開,以及當前成熟程度

pkg/features/kube_features.go

kubectl code 位置問題

kubectl中 auth/convert/cp/get 在 k8s.io/kubernetes/pkg 下面,其餘代碼在 k8s.io/kubectl 下面,這個是因為以前都在 k8s.io/kubernetes/pkg, 後來漸漸 move to staging目錄下,還沒移完

kubectl 實現方式

kubectl 的核心在 vendor/k8s.io/cli-runtime,其中比較重要的是 vendor/k8s.io/cli-runtime/pkg/resource/builder.go

構建builder->設置builder參數->Do設置vistors->Info取回並修飾結果

type RESTClientGetter interface {  	// ToRESTConfig returns restconfig  	ToRESTConfig() (*rest.Config, error)  	// ToDiscoveryClient returns discovery client  	// DiscoveryInterface holds the methods that discover server-supported API groups,      // versions and resources.  	ToDiscoveryClient() (discovery.CachedDiscoveryInterface, error)  	// ToRESTMapper returns a restmapper  	// RESTMapper allows clients to map resources to kind, and map kind and version      // to interfaces for manipulating those objects. It is primarily intended for      // consumers of Kubernetes compatible REST APIs as defined in docs/devel/api-conventions.md.  	ToRESTMapper() (meta.RESTMapper, error)  	// ToRawKubeConfigLoader return kubeconfig loader as-is  	ToRawKubeConfigLoader() clientcmd.ClientConfig  }    // Result contains helper methods for dealing with the outcome of a Builder.  type Result struct {  	err     error  	visitor Visitor    	sources            []Visitor  	singleItemImplied  bool  	targetsSingleItems bool    	mapper       *mapper  	ignoreErrors []utilerrors.Matcher    	// populated by a call to Infos  	info []*Info  }

kubelet的核心組件

image.png
image

圖片來自 https://feisky.gitbooks.io/kubernetes/components/kubelet.html (上圖中kubelet之後還有ContainerManager(名字容易混淆)會設置cgroup,device resource之類的信息,然後才會調用genericRuntimeManager)

  • PodWorkers: podWorkers handle syncing Pods in response to events.
  • kubepod.Manager: podManager is a facade that abstracts away the various sources of pods this Kubelet services.
  • eviction.Manager: Needed to observe and respond to situations that could impact node stability
  • kubecontainer.ContainerCommandRunner: run in container, 即 exec in container
  • cadvisor: 監控
  • dnsConfigurer: setting up DNS resolver configuration when launching pods
  • VolumePluginMgr: Volume plugins.
  • probeManager/livenessManager: Handles container probing/ Manages container health check results.
  • kubecontainer.ContainerGC: Policy for handling garbage collection of dead containers.
  • images.ImageGCManager: Manager for image garbage collection.
  • logs.ContainerLogManager: Manager for container logs.
  • secret.Manager: Secret manager
  • configmap.Manager: ConfigMap manager.
  • certificate.Manager: Handles certificate rotations.
  • status.Manager: Syncs pods statuses with apiserver; also used as a cache of statuses.
  • volumemanager.VolumeManager: attach/mount/unmount/detach volumes for pods
  • cloudprovider.Interface
  • cloudresource.SyncManager
  • kubecontainer.Runtime: Container runtime, GetPods/SyncPod/KillPod/GetPodStatus/ImageService….
  • kubecontainer.StreamingRuntime: GetExec/GetAttach/GetPortForward
  • RuntimeService:
    • ContainerManager(Create/Start/Stop/List/Exec…Container)
    • PodSandboxManager(Run/Stop/Remove..PodSandbox)
    • ContainerStatsManager
  • PodLifecycleEventGenerator: Generates pod events.
  • oomwatcher.Watcher
  • cm.ContainerManager: Start/SystemCgroupsLimit/GetNodeConfig/GetMountedSubsystems/GetQOSContainersInfo…
  • pluginmanager.PluginManager

kubelet的入口線程

kubelet.go

  • ListenAndServe/ListenAndServeReadOnly: server 10250/10255
  • ListenAndServePodResources: a gRPC server to serve the PodResources service
  • For serviceIndexer/nodeIndexer: get local cache for service and node object
  • containerGC/imageManager.GarbageCollection: 定期 GarbageCollect, call kubeGenericRuntimeManager.containerGC evictContainers/evictSandboxes/evictPodLogsDirectories / realImageGCManager.GarbageCollect
  • pluginManager.Run: CSIPlugin/DevicePlugin
  • cloudResourceSyncManager: sync node address
  • volumeManager: runs a set of asynchronous loops that figure out which volumes need to be attached/mounted/unmounted/detached based on the pods scheduled on this node and makes it so.
  • syncNodeStatus/fastStatusUpdateOnce/nodeLeaseController: updateNodeStatus 兩種上報方式,lease輕量不易因為集群數據量過大失敗
  • updateRuntimeUp: every 5s , initializing the runtime dependent modules when the container runtime first comes up
  • podKiller: every 1s, Start a goroutine responsible for killing pods (that are not properly handled by pod workers).
syncLoopIteration  // Arguments:  // 1.  configCh:       a channel to read config events from, 來自http/status/apiserver  // 2.  handler:        the SyncHandler to dispatch pods to, 同步狀態  // 3.  syncCh:         a channel to read periodic sync events from  // 4.  housekeepingCh: a channel to read housekeeping events from  // 5.  plegCh:         a channel to read PLEG updates from, 容器狀態變化ContainerStarted/Died/Removed/..

cgroup 結構

https://zhuanlan.zhihu.com/p/38359775

# ubuntu 16.04; kubernetes v1.10.5  ubuntu@VM-0-12-ubuntu:~$ systemd-cgls  Control group /:  -.slice  ├─init.scope  │ └─1 /sbin/init  ├─system.slice  │ ├─avahi-daemon.service  │ │ ├─1268 avahi-daemon: running [VM-0-12-ubuntu.local  │ │ └─1283 avahi-daemon: chroot helpe  | | -- 略  │ ├─dockerd.service  │ │ ├─ 5134 /usr/bin/dockerd --config-file=/etc/docker/daemon.json  │ │ ├─ 5143 docker-containerd --config /var/run/docker/containerd/containerd.toml  │ │ └─29537 docker-containerd-shim -namespace moby -workdir /data/docker/containerd/daemon/io.containerd.runtime.v1.linux/moby/303a0718c84995350d835f6e2d17036  | | | 略  │ ├─accounts-daemon.service  │ │ └─1262 /usr/lib/accountsservice/accounts-daemon  | | --略  │ ├─NetworkManager.service  │ │ └─1287 /usr/sbin/NetworkManager --no-daemon  │ ├─kubelet.service  │ │ └─5239 /usr/bin/kubelet --cluster-dns=10.15.255.254 --network-plugin=cni --kube-reserved=cpu=80m,memory=1319Mi --cloud-config=/etc/kubernetes/qcloud.conf  │ ├─rsyslog.service  │ │ └─1251 /usr/sbin/rsyslogd -n  | | 略  │ └─acpid.service  │   └─1293 /usr/sbin/acpid  ├─user.slice  │ └─user-500.slice  │   ├─session-129315.scope  │   │ ├─27862 sshd: ubuntu [priv]  │   └─[email protected]  │     └─init.scope  │       ├─27870 /lib/systemd/systemd --user  │       └─27871 (sd-pam)  └─kubepods    ├─burstable    │ ├─pod5645ed58-e98f-11e9-8443-52540087514c    │ │ ├─1f8f76dacb8334bd8d8ab2a7432d2cc250286ca6b5b73ab6dca9a845b77a3a09    │ │ │ └─8958 /configmap-reload --webhook-url=http://localhost:9090/-/reload --volume-dir=/etc/prometheus/rules/prometheus-k8s-rulefiles-0    └─besteffort      ├─pod3cf3ae0d-b7f4-11e9-8443-52540087514c      │ ├─fde2178c5fa634206c2c86756c107c3de2828d2f90e2ea4c6a3b57f50c25267c      │ │ └─5435 /pause      │ └─5b4082efeb73ad102cc3fea33ff4c931c042a7120f0cd5277d46660aedffffde      │   ├─ 5663 sh /install-cni.sh      │   └─20347 sleep 3600

APIserver 結構

一個不錯的參考:https://note.youdao.com/ynoteshare1/index.html?id=63f58c5e98634c8b3df9da2b024aacd5&type=note

image.png

重要流程

  • CreateKubeAPIServer
    • completedConfig.InstallLegacyAPI: api/all 和 api/legacy,分別控制全部和遺留 API
    • completedConfig.InstallAPIs
      • apiGroupInfo=restStorageBuilder.NewRESTStorage: 比較重要的元素是 VersionedResourcesStorageMap mapstringmapstringrest.Storage: {"v1beta1":{"deployments":deploymentStorage.Deployment}}
        • 以"app"為例: if v1enable: storageMap=RESTStorageProvider(storage_app).v1Storage
          • deploymentStorage = deploymentstore.NewStorage, storage"deployments" = deploymentStorage.Deployment; deploymentStorage 裏面是 XXXREST元素, XXXREST元素的解釋見下面
      • GenericAPIServer.InstallAPIGroups
        • s.installAPIResources: 核心安裝 API 的方法,建立 api 和 storage的關係
          • apiGroupVersion.InstallREST
            • installer.Install()
              • registerResourceHandlers: 對storage裏面所有的path 關聯 storage
              • 比如 actions = appendIf(actions, action{"GET", itemPath, nameParams, namer, false}, isGetter)
              • handler = restfulGetResource(getter, exporter, reqScope)
              • route := ws.GET(action.Path).To(handler).Doc(doc)….
        • s.DiscoveryGroupManager.AddGroup
        • s.Handler.GoRestfulContainer.Add(discovery.NewAPIGroupHandler(s.Serializer, apiGroup).WebService())
// NewREST returns a RESTStorage object that will work against deployments.  func NewREST(optsGetter generic.RESTOptionsGetter) (*REST, *StatusREST, *RollbackREST, error) {  	store := &genericregistry.Store{  		NewFunc:                  func() runtime.Object { return &apps.Deployment{} },  		NewListFunc:              func() runtime.Object { return &apps.DeploymentList{} },  		DefaultQualifiedResource: apps.Resource("deployments"),    		CreateStrategy: deployment.Strategy,  		UpdateStrategy: deployment.Strategy,  		DeleteStrategy: deployment.Strategy,    		TableConvertor: printerstorage.TableConvertor{TableGenerator: printers.NewTableGenerator().With(printersinternal.AddHandlers)},  	}  	options := &generic.StoreOptions{RESTOptions: optsGetter}  	if err := store.CompleteWithOptions(options); err != nil {  		return nil, nil, nil, err  	}    	statusStore := *store  	statusStore.UpdateStrategy = deployment.StatusStrategy  	return &REST{store, []string{"all"}}, &StatusREST{store: &statusStore}, &RollbackREST{store: store}, nil  }    type REST struct {  	*genericregistry.Store  	categories []string  }    genericregistry.Store 定義了 NewList,NewObject,CreateStrategy,UpdateStrategy  核心是 DryRunnableStorage:DryRunnableStorage中的 storage.Interface 是實際對存儲的 crud 入口    type DryRunnableStorage struct {  	Storage storage.Interface  	Codec   runtime.Codec  }    Storage 是 Cacher struct {真實stotrage -> etcd3/store}    generic.StoreOptions.RESTOptions決定了後端存儲 是 completedConfig.(genericapiserver.CompletedConfig).的一部分 從最上面一層一層傳遞下來 來自 buildGenericConfig <- createAggregatorConfig  master.config->completedConfig    最終可以發現 generic.RESTOptions.Decorator = genericregistry.StorageWithCacher(cacheSize) 即帶 cache 的etcd後端 (EnableWatchCache打開的時候 default true)    cache 的實現在 vendor/k8s.io/apiserver/pkg/storage/cacher/cacher.go  下面具體的看這個 cache的實現

apiserver 裏面的 cache 實現

watch 為例; user 為 vendor/k8s.io/apiserver/pkg/registry/generic/registry/store.go

image.png

動作

處理

Create

etcd3/store:Create

Delete

etcd3/store:Delete

Watch

etcd3/註冊 watcher 接受事件, 從 cache

Get

resourceVersion=""時直接去 store get; 否則從 cache 獲取(需要wait resourceVersion)

List

和 Get 類似

Debug Etcd

# 下載 etcd  ETCD_VER=v3.4.0  DOWNLOAD_URL=https://github.com/etcd-io/etcd/releases/download  curl -L ${DOWNLOAD_URL}/${ETCD_VER}/etcd-${ETCD_VER}-linux-amd64.tar.gz -o /tmp/etcd-${ETCD_VER}-linux-amd64.tar.gz && tar xzvf /tmp/etcd-${ETCD_VER}-linux-amd64.tar.gz -C /tmp/etcd-download-test --strip-components=1 && rm -f /tmp/etcd-${ETCD_VER}-linux-amd64.tar.gz    # 配置環境  export ETCDCTL_CERT=/etc/kubernetes/certs/kube-apiserver-etcd-client.crt  export ETCDCTL_KEY=/etc/kubernetes/certs/kube-apiserver-etcd-client.key  export ETCDCTL_CACERT=/etc/kubernetes/certs/kube-apiserver-etcd-ca.crt  export ETCDCTL_ENDPOINTS=https://etcd.cls-4lr4c4wx.ccs.tencent-cloud.com:2379      etcdctl get  "" --prefix=true  --limit=1 # get key and value  etcdctl get  "" --prefix=true --keys-only --limit=100 # get only keys  etcdctl get "/cls-4lr4c4wx/pods" --prefix=true --keys-only  --limit=10 # get pod keys; 這裡cls-4lr4c4wx是etcd prefix  etcdctl get "/cls-4lr4c3wx/configmaps" --prefix=true --limit 1 --write-out="json" # 輸出為 json

1.16 裏面的 watch bookmark event是什麼意思

比如一個客戶端 watch pod

GET /api/v1/namespaces/test/pods?watch=1&resourceVersion=10245&allowWatchBookmarks=true  ---  200 OK  Transfer-Encoding: chunked  Content-Type: application/json  {    "type": "ADDED",    "object": {"kind": "Pod", "apiVersion": "v1", "metadata": {"resourceVersion": "10596", ...}, ...}  }  {    "type": "BOOKMARK",    "object": {"kind": "Pod", "apiVersion": "v1", "metadata": {"resourceVersion": "12746"} }  }

然後 watcher 發生了重啟, 有BOOKMARK的 watcher就可以從 resourceVersion=12746 開始繼續 watch, 而沒有收到 BOOKMARK 的,只能從 resourceVersion=10596 繼續 watch,但是其實 10596-12746 起碼沒有他關心的 event了.

apiserver 是如何實現 Aggregator 的

i56jqed8mj.jpg

aggregator 本身也是一個 controller

image.png