pod(八):pod的調度——將 Pod 指派給節點
- 2022 年 11 月 6 日
- 筆記
- Docker容器, Kubernetes(k8s)管理
一.系統環境
伺服器版本 | docker軟體版本 | Kubernetes(k8s)集群版本 | CPU架構 |
---|---|---|---|
CentOS Linux release 7.4.1708 (Core) | Docker version 20.10.12 | v1.21.9 | x86_64 |
Kubernetes集群架構:k8scloude1作為master節點,k8scloude2,k8scloude3作為worker節點
伺服器 | 作業系統版本 | CPU架構 | 進程 | 功能描述 |
---|---|---|---|---|
k8scloude1/192.168.110.130 | CentOS Linux release 7.4.1708 (Core) | x86_64 | docker,kube-apiserver,etcd,kube-scheduler,kube-controller-manager,kubelet,kube-proxy,coredns,calico | k8s master節點 |
k8scloude2/192.168.110.129 | CentOS Linux release 7.4.1708 (Core) | x86_64 | docker,kubelet,kube-proxy,calico | k8s worker節點 |
k8scloude3/192.168.110.128 | CentOS Linux release 7.4.1708 (Core) | x86_64 | docker,kubelet,kube-proxy,calico | k8s worker節點 |
二.前言
本文介紹pod的調度,即如何讓pod運行在Kubernetes集群的指定節點。
進行pod的調度的前提是已經有一套可以正常運行的Kubernetes集群,關於Kubernetes(k8s)集群的安裝部署,可以查看部落格《Centos7 安裝部署Kubernetes(k8s)集群》//www.cnblogs.com/renshengdezheli/p/16686769.html
三.pod的調度
3.1 pod的調度概述
你可以約束一個 Pod 以便 限制 其只能在特定的節點上運行, 或優先在特定的節點上運行。 有幾種方法可以實現這點,推薦的方法都是用 標籤選擇算符來進行選擇。 通常這樣的約束不是必須的,因為調度器將自動進行合理的放置(比如,將 Pod 分散到節點上, 而不是將 Pod 放置在可用資源不足的節點上等等)。但在某些情況下,你可能需要進一步控制 Pod 被部署到哪個節點。例如,確保 Pod 最終落在連接了 SSD 的機器上, 或者將來自兩個不同的服務且有大量通訊的 Pods 被放置在同一個可用區。
你可以使用下列方法中的任何一種來選擇 Kubernetes 對特定 Pod 的調度:
- 與節點標籤匹配的 nodeSelector
- 親和性與反親和性
- nodeName 欄位
- Pod 拓撲分布約束
3.2 pod自動調度
如果不手動指定pod運行在哪個節點上,k8s會自動調度pod的,k8s自動調度pod在哪個節點上運行考慮的因素有:
- 待調度的pod列表
- 可用的node列表
- 調度演算法:主機過濾,主機打分
3.2.1 創建3個主機埠為80的pod
查看hostPort欄位的解釋,hostPort欄位表示把pod的埠映射到節點,即在節點上公開 Pod 的埠。
#主機埠映射:hostPort: 80
[root@k8scloude1 pod]# kubectl explain pods.spec.containers.ports.hostPort
KIND: Pod
VERSION: v1
FIELD: hostPort <integer>
DESCRIPTION:
Number of port to expose on the host. If specified, this must be a valid
port number, 0 < x < 65536. If HostNetwork is specified, this must match
ContainerPort. Most containers do not need this.
創建第一個pod,hostPort: 80表示把容器的80埠映射到節點的80埠
[root@k8scloude1 pod]# vim schedulepod.yaml
#kind: Pod表示資源類型為Pod labels指定pod標籤 metadata下面的name指定pod名字 containers下面全是容器的定義
#image指定鏡像名字 imagePullPolicy指定鏡像下載策略 containers下面的name指定容器名
#resources指定容器資源(CPU,記憶體等) env指定容器里的環境變數 dnsPolicy指定DNS策略
#restartPolicy容器重啟策略 ports指定容器埠 containerPort容器埠 hostPort節點上的埠
[root@k8scloude1 pod]# cat schedulepod.yaml
apiVersion: v1
kind: Pod
metadata:
creationTimestamp: null
labels:
run: pod
name: pod
namespace: pod
spec:
containers:
- image: nginx
imagePullPolicy: IfNotPresent
name: pod
resources: {}
ports:
- name: http
containerPort: 80
protocol: TCP
hostPort: 80
dnsPolicy: ClusterFirst
restartPolicy: Always
status: {}
[root@k8scloude1 pod]# kubectl apply -f schedulepod.yaml
pod/pod created
[root@k8scloude1 pod]# kubectl get pods
NAME READY STATUS RESTARTS AGE
pod 1/1 Running 0 6s
可以看到pod創建成功。
接下來創建第二個pod,hostPort: 80表示把容器的80埠映射到節點的80埠,兩個pod只有pod名字不一樣。
[root@k8scloude1 pod]# cp schedulepod.yaml schedulepod1.yaml
[root@k8scloude1 pod]# vim schedulepod1.yaml
[root@k8scloude1 pod]# cat schedulepod1.yaml
apiVersion: v1
kind: Pod
metadata:
creationTimestamp: null
labels:
run: pod1
name: pod1
namespace: pod
spec:
containers:
- image: nginx
imagePullPolicy: IfNotPresent
name: pod1
resources: {}
ports:
- name: http
containerPort: 80
protocol: TCP
hostPort: 80
dnsPolicy: ClusterFirst
restartPolicy: Always
status: {}
[root@k8scloude1 pod]# kubectl apply -f schedulepod1.yaml
pod/pod1 created
[root@k8scloude1 pod]# kubectl get pods
NAME READY STATUS RESTARTS AGE
pod 1/1 Running 0 11m
pod1 1/1 Running 0 5s
第二個pod創建成功,現在創建第三個pod。
開篇我們已經介紹過集群架構,Kubernetes集群架構:k8scloude1作為master節點,k8scloude2,k8scloude3作為worker節點
,k8s集群只有2個worker節點,master節點默認不運行應用pod,主機埠80已經被佔用兩台worker節點全部佔用,所以pod2無法運行。
[root@k8scloude1 pod]# sed 's/pod1/pod2/' schedulepod1.yaml | kubectl apply -f -
pod/pod2 created
#主機埠80已經被佔用兩台worker節點全部佔用,pod2無法運行
[root@k8scloude1 pod]# kubectl get pods
NAME READY STATUS RESTARTS AGE
pod 1/1 Running 0 16m
pod1 1/1 Running 0 5m28s
pod2 0/1 Pending 0 5s
觀察pod在k8s集群的分布情況,NODE
顯示pod運行在哪個節點
[root@k8scloude1 pod]# kubectl get pods
NAME READY STATUS RESTARTS AGE
pod 1/1 Running 0 18m
pod1 1/1 Running 0 7m28s
[root@k8scloude1 pod]# kubectl get pods -o wide
NAME READY STATUS RESTARTS AGE IP NODE NOMINATED NODE READINESS GATES
pod 1/1 Running 0 29m 10.244.251.208 k8scloude3 <none> <none>
pod1 1/1 Running 0 18m 10.244.112.156 k8scloude2 <none> <none>
刪除pod
[root@k8scloude1 pod]# kubectl delete pod pod2
pod "pod2" deleted
[root@k8scloude1 pod]# kubectl delete pod pod1 pod
pod "pod1" deleted
pod "pod" deleted
上面三個pod都是k8s自動調度的,下面我們手動指定pod運行在哪個節點。
3.3 使用nodeName 欄位指定pod運行在哪個節點
使用nodeName 欄位指定pod運行在哪個節點,這是一種比較直接的方式,nodeName 是 Pod 規約中的一個欄位。如果 nodeName 欄位不為空,調度器會忽略該 Pod, 而指定節點上的 kubelet 會嘗試將 Pod 放到該節點上。 使用 nodeName 規則的優先順序會高於使用 nodeSelector 或親和性與非親和性的規則。
使用 nodeName 來選擇節點的方式有一些局限性:
- 如果所指代的節點不存在,則 Pod 無法運行,而且在某些情況下可能會被自動刪除。
- 如果所指代的節點無法提供用來運行 Pod 所需的資源,Pod 會失敗, 而其失敗原因中會給出是否因為記憶體或 CPU 不足而造成無法運行。
- 在雲環境中的節點名稱並不總是可預測的,也不總是穩定的。
創建pod,nodeName: k8scloude3表示pod要運行在名為k8scloude3
的節點
[root@k8scloude1 pod]# vim schedulepod2.yaml
#kind: Pod表示資源類型為Pod labels指定pod標籤 metadata下面的name指定pod名字 containers下面全是容器的定義
#image指定鏡像名字 imagePullPolicy指定鏡像下載策略 containers下面的name指定容器名
#resources指定容器資源(CPU,記憶體等) env指定容器里的環境變數 dnsPolicy指定DNS策略
#restartPolicy容器重啟策略 ports指定容器埠 containerPort容器埠 hostPort節點上的埠
#nodeName: k8scloude3指定pod在k8scloude3上運行
[root@k8scloude1 pod]# cat schedulepod2.yaml
apiVersion: v1
kind: Pod
metadata:
creationTimestamp: null
labels:
run: pod1
name: pod1
namespace: pod
spec:
nodeName: k8scloude3
containers:
- image: nginx
imagePullPolicy: IfNotPresent
name: pod1
resources: {}
ports:
- name: http
containerPort: 80
protocol: TCP
hostPort: 80
dnsPolicy: ClusterFirst
restartPolicy: Always
status: {}
[root@k8scloude1 pod]# kubectl apply -f schedulepod2.yaml
pod/pod1 created
可以看到pod運行在k8scloude3節點
[root@k8scloude1 pod]# kubectl get pod -o wide
NAME READY STATUS RESTARTS AGE IP NODE NOMINATED NODE READINESS GATES
pod1 1/1 Running 0 7s 10.244.251.209 k8scloude3 <none> <none>
[root@k8scloude1 pod]# kubectl delete pod pod1 --force
warning: Immediate deletion does not wait for confirmation that the running resource has been terminated. The resource may continue to run on the cluster indefinitely.
[root@k8scloude1 pod]# kubectl get pods
No resources found in pod namespace.
創建pod,nodeName: k8scloude1讓pod運行在k8scloude1節點
[root@k8scloude1 pod]# vim schedulepod3.yaml
#kind: Pod表示資源類型為Pod labels指定pod標籤 metadata下面的name指定pod名字 containers下面全是容器的定義
#image指定鏡像名字 imagePullPolicy指定鏡像下載策略 containers下面的name指定容器名
#resources指定容器資源(CPU,記憶體等) env指定容器里的環境變數 dnsPolicy指定DNS策略
#restartPolicy容器重啟策略 ports指定容器埠 containerPort容器埠 hostPort節點上的埠
#nodeName: k8scloude1讓pod運行在k8scloude1節點
[root@k8scloude1 pod]# cat schedulepod3.yaml
apiVersion: v1
kind: Pod
metadata:
creationTimestamp: null
labels:
run: pod1
name: pod1
namespace: pod
spec:
nodeName: k8scloude1
containers:
- image: nginx
imagePullPolicy: IfNotPresent
name: pod1
resources: {}
ports:
- name: http
containerPort: 80
protocol: TCP
hostPort: 80
dnsPolicy: ClusterFirst
restartPolicy: Always
status: {}
[root@k8scloude1 pod]# kubectl apply -f schedulepod3.yaml
pod/pod1 created
可以看到pod運行在k8scloude1,注意k8scloude1是master節點,master節點一般不運行應用pod,並且k8scloude1有污點,一般來說,pod是不運行在有污點的主機上的,如果強制調度上去的話,pod的狀態應該是pending,但是通過nodeName可以把一個pod調度到有污點的主機上正常運行的,比如nodeName指定pod運行在master上
[root@k8scloude1 pod]# kubectl get pods -o wide
NAME READY STATUS RESTARTS AGE IP NODE NOMINATED NODE READINESS GATES
pod1 1/1 Running 0 47s 10.244.158.81 k8scloude1 <none> <none>
[root@k8scloude1 pod]# kubectl delete pod pod1 --force
warning: Immediate deletion does not wait for confirmation that the running resource has been terminated. The resource may continue to run on the cluster indefinitely.
pod "pod1" force deleted
3.4 使用節點標籤nodeSelector指定pod運行在哪個節點
與很多其他 Kubernetes 對象類似,節點也有標籤。 你可以手動地添加標籤。 Kubernetes 也會為集群中所有節點添加一些標準的標籤。
通過為節點添加標籤,你可以準備讓 Pod 調度到特定節點或節點組上。 你可以使用這個功能來確保特定的 Pod 只能運行在具有一定隔離性,安全性或監管屬性的節點上。
nodeSelector 是節點選擇約束的最簡單推薦形式。你可以將 nodeSelector 欄位添加到 Pod 的規約中設置你希望目標節點所具有的節點標籤。 Kubernetes 只會將 Pod 調度到擁有你所指定的每個標籤的節點上。nodeSelector 提供了一種最簡單的方法來將 Pod 約束到具有特定標籤的節點上。
3.4.1 查看標籤
查看節點node的標籤,標籤的格式:鍵值對:xxxx/yyyy.aaaa=456123,xxxx1/yyyy1.aaaa=456123,–show-labels參數顯示標籤
[root@k8scloude1 pod]# kubectl get nodes --show-labels
NAME STATUS ROLES AGE VERSION LABELS
k8scloude1 Ready control-plane,master 7d1h v1.21.0 beta.kubernetes.io/arch=amd64,beta.kubernetes.io/os=linux,kubernetes.io/arch=amd64,kubernetes.io/hostname=k8scloude1,kubernetes.io/os=linux,node-role.kubernetes.io/control-plane=,node-role.kubernetes.io/master=,node.kubernetes.io/exclude-from-external-load-balancers=
k8scloude2 Ready <none> 7d v1.21.0 beta.kubernetes.io/arch=amd64,beta.kubernetes.io/os=linux,kubernetes.io/arch=amd64,kubernetes.io/hostname=k8scloude2,kubernetes.io/os=linux
k8scloude3 Ready <none> 7d v1.21.0 beta.kubernetes.io/arch=amd64,beta.kubernetes.io/os=linux,kubernetes.io/arch=amd64,kubernetes.io/hostname=k8scloude3,kubernetes.io/os=linux
查看namespace的標籤
[root@k8scloude1 pod]# kubectl get ns --show-labels
NAME STATUS AGE LABELS
default Active 7d1h kubernetes.io/metadata.name=default
kube-node-lease Active 7d1h kubernetes.io/metadata.name=kube-node-lease
kube-public Active 7d1h kubernetes.io/metadata.name=kube-public
kube-system Active 7d1h kubernetes.io/metadata.name=kube-system
ns1 Active 6d5h kubernetes.io/metadata.name=ns1
ns2 Active 6d5h kubernetes.io/metadata.name=ns2
pod Active 4d2h kubernetes.io/metadata.name=pod
查看pod的標籤
[root@k8scloude1 pod]# kubectl get pod -A --show-labels
NAMESPACE NAME READY STATUS RESTARTS AGE LABELS
kube-system calico-kube-controllers-6b9fbfff44-4jzkj 1/1 Running 12 7d k8s-app=calico-kube-controllers,pod-template-hash=6b9fbfff44
kube-system calico-node-bdlgm 1/1 Running 7 7d controller-revision-hash=6b57d9cd54,k8s-app=calico-node,pod-template-generation=1
kube-system calico-node-hx8bk 1/1 Running 7 7d controller-revision-hash=6b57d9cd54,k8s-app=calico-node,pod-template-generation=1
kube-system calico-node-nsbfs 1/1 Running 7 7d controller-revision-hash=6b57d9cd54,k8s-app=calico-node,pod-template-generation=1
kube-system coredns-545d6fc579-7wm95 1/1 Running 7 7d1h k8s-app=kube-dns,pod-template-hash=545d6fc579
kube-system coredns-545d6fc579-87q8j 1/1 Running 7 7d1h k8s-app=kube-dns,pod-template-hash=545d6fc579
kube-system etcd-k8scloude1 1/1 Running 7 7d1h component=etcd,tier=control-plane
kube-system kube-apiserver-k8scloude1 1/1 Running 11 7d1h component=kube-apiserver,tier=control-plane
kube-system kube-controller-manager-k8scloude1 1/1 Running 7 7d1h component=kube-controller-manager,tier=control-plane
kube-system kube-proxy-599xh 1/1 Running 7 7d1h controller-revision-hash=6795549d44,k8s-app=kube-proxy,pod-template-generation=1
kube-system kube-proxy-lpj8z 1/1 Running 7 7d1h controller-revision-hash=6795549d44,k8s-app=kube-proxy,pod-template-generation=1
kube-system kube-proxy-zxlk9 1/1 Running 7 7d1h controller-revision-hash=6795549d44,k8s-app=kube-proxy,pod-template-generation=1
kube-system kube-scheduler-k8scloude1 1/1 Running 7 7d1h component=kube-scheduler,tier=control-plane
kube-system metrics-server-bcfb98c76-k5dmj 1/1 Running 6 6d5h k8s-app=metrics-server,pod-template-hash=bcfb98c76
3.4.2 創建標籤
以node-role.kubernetes.io/control-plane= 標籤為例,鍵是node-role.kubernetes.io/control-plane,值為空。
創建標籤的語法:kubectl label 對象類型 對象名 鍵=值
給k8scloude2節點設置標籤
[root@k8scloude1 pod]# kubectl label nodes k8scloude2 k8snodename=k8scloude2
node/k8scloude2 labeled
[root@k8scloude1 pod]# kubectl get nodes --show-labels
NAME STATUS ROLES AGE VERSION LABELS
k8scloude1 Ready control-plane,master 7d1h v1.21.0 beta.kubernetes.io/arch=amd64,beta.kubernetes.io/os=linux,kubernetes.io/arch=amd64,kubernetes.io/hostname=k8scloude1,kubernetes.io/os=linux,node-role.kubernetes.io/control-plane=,node-role.kubernetes.io/master=,node.kubernetes.io/exclude-from-external-load-balancers=
k8scloude2 Ready <none> 7d1h v1.21.0 beta.kubernetes.io/arch=amd64,beta.kubernetes.io/os=linux,k8snodename=k8scloude2,kubernetes.io/arch=amd64,kubernetes.io/hostname=k8scloude2,kubernetes.io/os=linux
k8scloude3 Ready <none> 7d1h v1.21.0 beta.kubernetes.io/arch=amd64,beta.kubernetes.io/os=linux,kubernetes.io/arch=amd64,kubernetes.io/hostname=k8scloude3,kubernetes.io/os=linux
k8scloude2節點刪除標籤
[root@k8scloude1 pod]# kubectl label nodes k8scloude2 k8snodename-
node/k8scloude2 labeled
[root@k8scloude1 pod]# kubectl get nodes --show-labels
NAME STATUS ROLES AGE VERSION LABELS
k8scloude1 Ready control-plane,master 7d1h v1.21.0 beta.kubernetes.io/arch=amd64,beta.kubernetes.io/os=linux,kubernetes.io/arch=amd64,kubernetes.io/hostname=k8scloude1,kubernetes.io/os=linux,node-role.kubernetes.io/control-plane=,node-role.kubernetes.io/master=,node.kubernetes.io/exclude-from-external-load-balancers=
k8scloude2 Ready <none> 7d1h v1.21.0 beta.kubernetes.io/arch=amd64,beta.kubernetes.io/os=linux,kubernetes.io/arch=amd64,kubernetes.io/hostname=k8scloude2,kubernetes.io/os=linux
k8scloude3 Ready <none> 7d1h v1.21.0 beta.kubernetes.io/arch=amd64,beta.kubernetes.io/os=linux,kubernetes.io/arch=amd64,kubernetes.io/hostname=k8scloude3,kubernetes.io/os=linux
列出含有標籤k8snodename=k8scloude2的節點
[root@k8scloude1 pod]# kubectl label nodes k8scloude2 k8snodename=k8scloude2
#列出含有標籤k8snodename=k8scloude2的節點
[root@k8scloude1 pod]# kubectl get nodes -l k8snodename=k8scloude2
NAME STATUS ROLES AGE VERSION
k8scloude2 Ready <none> 7d1h v1.21.0
[root@k8scloude1 pod]# kubectl label nodes k8scloude2 k8snodename-
node/k8scloude2 labeled
對所有節點設置標籤
[root@k8scloude1 pod]# kubectl label nodes --all k8snodename=cloude
node/k8scloude1 labeled
node/k8scloude2 labeled
node/k8scloude3 labeled
列出含有標籤k8snodename=cloude的節點
#列出含有標籤k8snodename=cloude的節點
[root@k8scloude1 pod]# kubectl get nodes -l k8snodename=cloude
NAME STATUS ROLES AGE VERSION
k8scloude1 Ready control-plane,master 7d1h v1.21.0
k8scloude2 Ready <none> 7d1h v1.21.0
k8scloude3 Ready <none> 7d1h v1.21.0
#刪除標籤
[root@k8scloude1 pod]# kubectl label nodes --all k8snodename-
node/k8scloude1 labeled
node/k8scloude2 labeled
node/k8scloude3 labeled
[root@k8scloude1 pod]# kubectl get nodes -l k8snodename=cloude
No resources found
–overwrite參數,標籤的覆蓋
[root@k8scloude1 pod]# kubectl label nodes k8scloude2 k8snodename=k8scloude2
node/k8scloude2 labeled
#標籤的覆蓋
[root@k8scloude1 pod]# kubectl label nodes k8scloude2 k8snodename=k8scloude
error: 'k8snodename' already has a value (k8scloude2), and --overwrite is false
#--overwrite參數,標籤的覆蓋
[root@k8scloude1 pod]# kubectl label nodes k8scloude2 k8snodename=k8scloude --overwrite
node/k8scloude2 labeled
[root@k8scloude1 pod]# kubectl get nodes -l k8snodename=k8scloude2
No resources found
[root@k8scloude1 pod]# kubectl get nodes -l k8snodename=k8scloude
NAME STATUS ROLES AGE VERSION
k8scloude2 Ready <none> 7d1h v1.21.0
[root@k8scloude1 pod]# kubectl label nodes k8scloude2 k8snodename-
node/k8scloude2 labeled
Tips:如果不想在k8scloude1的ROLES里看到control-plane,則可以通過取消標籤達到目的:kubectl label nodes k8scloude1 node-role.kubernetes.io/control-plane- 進行取消標籤
[root@k8scloude1 pod]# kubectl get nodes --show-labels
NAME STATUS ROLES AGE VERSION LABELS
k8scloude1 Ready control-plane,master 7d1h v1.21.0 beta.kubernetes.io/arch=amd64,beta.kubernetes.io/os=linux,kubernetes.io/arch=amd64,kubernetes.io/hostname=k8scloude1,kubernetes.io/os=linux,node-role.kubernetes.io/control-plane=,node-role.kubernetes.io/master=,node.kubernetes.io/exclude-from-external-load-balancers=
k8scloude2 Ready <none> 7d1h v1.21.0 beta.kubernetes.io/arch=amd64,beta.kubernetes.io/os=linux,kubernetes.io/arch=amd64,kubernetes.io/hostname=k8scloude2,kubernetes.io/os=linux
k8scloude3 Ready <none> 7d1h v1.21.0 beta.kubernetes.io/arch=amd64,beta.kubernetes.io/os=linux,kubernetes.io/arch=amd64,kubernetes.io/hostname=k8scloude3,kubernetes.io/os=linux
[root@k8scloude1 pod]# kubectl label nodes k8scloude1 node-role.kubernetes.io/control-plane-
3.4.3 通過標籤控制pod在哪個節點運行
給k8scloude2節點打上標籤k8snodename=k8scloude2
[root@k8scloude1 pod]# kubectl label nodes k8scloude2 k8snodename=k8scloude2
node/k8scloude2 labeled
[root@k8scloude1 pod]# kubectl get nodes -l k8snodename=k8scloude2
NAME STATUS ROLES AGE VERSION
k8scloude2 Ready <none> 7d1h v1.21.0
[root@k8scloude1 pod]# kubectl get pods
No resources found in pod namespace.
創建pod,nodeSelector:k8snodename: k8scloude2 指定pod運行在標籤為k8snodename=k8scloude2的節點上
[root@k8scloude1 pod]# vim schedulepod4.yaml
[root@k8scloude1 pod]# cat schedulepod4.yaml
apiVersion: v1
kind: Pod
metadata:
creationTimestamp: null
labels:
run: pod1
name: pod1
namespace: pod
spec:
nodeSelector:
k8snodename: k8scloude2
containers:
- image: nginx
imagePullPolicy: IfNotPresent
name: pod1
resources: {}
ports:
- name: http
containerPort: 80
protocol: TCP
hostPort: 80
dnsPolicy: ClusterFirst
restartPolicy: Always
status: {}
[root@k8scloude1 pod]# kubectl apply -f schedulepod4.yaml
pod/pod1 created
可以看到pod運行在k8scloude2節點
[root@k8scloude1 pod]# kubectl get pod -o wide
NAME READY STATUS RESTARTS AGE IP NODE NOMINATED NODE READINESS GATES
pod1 1/1 Running 0 21s 10.244.112.158 k8scloude2 <none> <none>
刪除pod,刪除標籤
[root@k8scloude1 pod]# kubectl get pod --show-labels
NAME READY STATUS RESTARTS AGE LABELS
pod1 1/1 Running 0 32m run=pod1
[root@k8scloude1 pod]# kubectl delete pod pod1 --force
warning: Immediate deletion does not wait for confirmation that the running resource has been terminated. The resource may continue to run on the cluster indefinitely.
pod "pod1" force deleted
[root@k8scloude1 pod]# kubectl get pod --show-labels
No resources found in pod namespace.
[root@k8scloude1 pod]# kubectl label nodes k8scloude2 k8snodename-
node/k8scloude2 labeled
[root@k8scloude1 pod]# kubectl get nodes -l k8snodename=k8scloude2
No resources found
[root@k8scloude1 pod]# kubectl get nodes -l k8snodename=k8scloude
No resources found
注意:如果兩台主機的標籤是一致的,那麼通過在這兩台機器上進行打分,哪個機器分高,pod就運行在哪個pod上
給k8s集群的master節點打標籤
[root@k8scloude1 pod]# kubectl label nodes k8scloude1 k8snodename=k8scloude1
node/k8scloude1 labeled
[root@k8scloude1 pod]# kubectl get nodes -l k8snodename=k8scloude1
NAME STATUS ROLES AGE VERSION
k8scloude1 Ready control-plane,master 7d2h v1.21.0
創建pod,nodeSelector:k8snodename: k8scloude1 指定pod運行在標籤為k8snodename=k8scloude1的節點上
[root@k8scloude1 pod]# vim schedulepod5.yaml
[root@k8scloude1 pod]# cat schedulepod5.yaml
apiVersion: v1
kind: Pod
metadata:
creationTimestamp: null
labels:
run: pod1
name: pod1
namespace: pod
spec:
nodeSelector:
k8snodename: k8scloude1
containers:
- image: nginx
imagePullPolicy: IfNotPresent
name: pod1
resources: {}
ports:
- name: http
containerPort: 80
protocol: TCP
hostPort: 80
dnsPolicy: ClusterFirst
restartPolicy: Always
status: {}
[root@k8scloude1 pod]# kubectl apply -f schedulepod5.yaml
pod/pod1 created
因為k8scloude1上有污點,所以pod不能運行在k8scloude1上,pod狀態為Pending
[root@k8scloude1 pod]# kubectl get pod
NAME READY STATUS RESTARTS AGE
pod1 0/1 Pending 0 9s
刪除pod,刪除標籤
[root@k8scloude1 pod]# kubectl delete pod pod1
pod "pod1" deleted
[root@k8scloude1 pod]# kubectl get pod
No resources found in pod namespace.
[root@k8scloude1 pod]# kubectl label nodes k8scloude1 k8snodename-
node/k8scloude1 labeled
[root@k8scloude1 pod]# kubectl get nodes -l k8snodename=k8scloude1
No resources found
3.5 使用親和性與反親和性調度pod
nodeSelector 提供了一種最簡單的方法來將 Pod 約束到具有特定標籤的節點上。 親和性和反親和性擴展了你可以定義的約束類型。使用親和性與反親和性的一些好處有:
-
親和性、反親和性語言的表達能力更強。nodeSelector 只能選擇擁有所有指定標籤的節點。 親和性、反親和性為你提供對選擇邏輯的更強控制能力。
-
你可以標明某規則是「軟需求」或者「偏好」,這樣調度器在無法找到匹配節點時仍然調度該 Pod。
-
你可以使用節點上(或其他拓撲域中)運行的其他 Pod 的標籤來實施調度約束, 而不是只能使用節點本身的標籤。這個能力讓你能夠定義規則允許哪些 Pod 可以被放置在一起。
親和性功能由兩種類型的親和性組成:
- 節點親和性功能類似於 nodeSelector 欄位,但它的表達能力更強,並且允許你指定軟規則。
- Pod 間親和性/反親和性允許你根據其他 Pod 的標籤來約束 Pod。
節點親和性概念上類似於 nodeSelector, 它使你可以根據節點上的標籤來約束 Pod 可以調度到哪些節點上。 節點親和性有兩種:
- requiredDuringSchedulingIgnoredDuringExecution: 調度器只有在規則被滿足的時候才能執行調度。此功能類似於 nodeSelector, 但其語法表達能力更強。
- preferredDuringSchedulingIgnoredDuringExecution: 調度器會嘗試尋找滿足對應規則的節點。如果找不到匹配的節點,調度器仍然會調度該 Pod。
在上述類型中,IgnoredDuringExecution 意味著如果節點標籤在 Kubernetes 調度 Pod 後發生了變更,Pod 仍將繼續運行
。
你可以使用 Pod 規約中的 .spec.affinity.nodeAffinity 欄位來設置節點親和性。
查看nodeAffinity欄位解釋
[root@k8scloude1 pod]# kubectl explain pods.spec.affinity.nodeAffinity
KIND: Pod
VERSION: v1
RESOURCE: nodeAffinity <Object>
DESCRIPTION:
Describes node affinity scheduling rules for the pod.
Node affinity is a group of node affinity scheduling rules.
FIELDS:
#軟策略
preferredDuringSchedulingIgnoredDuringExecution <[]Object>
The scheduler will prefer to schedule pods to nodes that satisfy the
affinity expressions specified by this field, but it may choose a node that
violates one or more of the expressions. The node that is most preferred is
the one with the greatest sum of weights, i.e. for each node that meets all
of the scheduling requirements (resource request, requiredDuringScheduling
affinity expressions, etc.), compute a sum by iterating through the
elements of this field and adding "weight" to the sum if the node matches
the corresponding matchExpressions; the node(s) with the highest sum are
the most preferred.
#硬策略
requiredDuringSchedulingIgnoredDuringExecution <Object>
If the affinity requirements specified by this field are not met at
scheduling time, the pod will not be scheduled onto the node. If the
affinity requirements specified by this field cease to be met at some point
during pod execution (e.g. due to an update), the system may or may not try
to eventually evict the pod from its node.
3.5.1 使用硬策略requiredDuringSchedulingIgnoredDuringExecution
創建pod,requiredDuringSchedulingIgnoredDuringExecution參數表示:節點必須包含一個鍵名為 kubernetes.io/hostname
的標籤, 並且該標籤的取值必須為 k8scloude2
或 k8scloude3
。
你可以使用 operator 欄位來為 Kubernetes 設置在解釋規則時要使用的邏輯操作符。 你可以使用 In、NotIn、Exists、DoesNotExist、Gt 和 Lt 之一作為操作符。NotIn 和 DoesNotExist 可用來實現節點反親和性行為。 你也可以使用節點污點 將 Pod 從特定節點上驅逐。
注意:
- 如果你同時指定了 nodeSelector 和 nodeAffinity,兩者 必須都要滿足, 才能將 Pod 調度到候選節點上。
- 如果你指定了多個與 nodeAffinity 類型關聯的 nodeSelectorTerms, 只要其中一個 nodeSelectorTerms 滿足的話,Pod 就可以被調度到節點上。
- 如果你指定了多個與同一 nodeSelectorTerms 關聯的 matchExpressions, 則只有當所有 matchExpressions 都滿足時 Pod 才可以被調度到節點上。
[root@k8scloude1 pod]# vim requiredDuringSchedule.yaml
#硬策略
[root@k8scloude1 pod]# cat requiredDuringSchedule.yaml
apiVersion: v1
kind: Pod
metadata:
creationTimestamp: null
labels:
run: pod1
name: pod1
namespace: pod
spec:
affinity:
nodeAffinity:
requiredDuringSchedulingIgnoredDuringExecution:
nodeSelectorTerms:
- matchExpressions:
- key: kubernetes.io/hostname
operator: In
values:
- k8scloude2
- k8scloude3
containers:
- image: nginx
imagePullPolicy: IfNotPresent
name: pod1
resources: {}
ports:
- name: http
containerPort: 80
protocol: TCP
hostPort: 80
dnsPolicy: ClusterFirst
restartPolicy: Always
status: {}
[root@k8scloude1 pod]# kubectl apply -f requiredDuringSchedule.yaml
pod/pod1 created
可以看到pod運行在k8scloude3節點
[root@k8scloude1 pod]# kubectl get pods
NAME READY STATUS RESTARTS AGE
pod1 1/1 Running 0 6s
[root@k8scloude1 pod]# kubectl get pods -o wide
NAME READY STATUS RESTARTS AGE IP NODE NOMINATED NODE READINESS GATES
pod1 1/1 Running 0 10s 10.244.251.212 k8scloude3 <none> <none>
[root@k8scloude1 pod]# kubectl delete pod pod1 --force
warning: Immediate deletion does not wait for confirmation that the running resource has been terminated. The resource may continue to run on the cluster indefinitely.
pod "pod1" force deleted
創建pod,requiredDuringSchedulingIgnoredDuringExecution參數表示:節點必須包含一個鍵名為 kubernetes.io/hostname
的標籤, 並且該標籤的取值必須為 k8scloude4
或 k8scloude5
。
[root@k8scloude1 pod]# vim requiredDuringSchedule1.yaml
[root@k8scloude1 pod]# cat requiredDuringSchedule1.yaml
apiVersion: v1
kind: Pod
metadata:
creationTimestamp: null
labels:
run: pod1
name: pod1
namespace: pod
spec:
affinity:
nodeAffinity:
requiredDuringSchedulingIgnoredDuringExecution:
nodeSelectorTerms:
- matchExpressions:
- key: kubernetes.io/hostname
operator: In
values:
- k8scloude4
- k8scloude5
containers:
- image: nginx
imagePullPolicy: IfNotPresent
name: pod1
resources: {}
ports:
- name: http
containerPort: 80
protocol: TCP
hostPort: 80
dnsPolicy: ClusterFirst
restartPolicy: Always
status: {}
[root@k8scloude1 pod]# kubectl apply -f requiredDuringSchedule1.yaml
pod/pod1 created
由於requiredDuringSchedulingIgnoredDuringExecution是硬策略,k8scloude4,k8scloude5不滿足條件,所以pod創建失敗
[root@k8scloude1 pod]# kubectl get pods -o wide
NAME READY STATUS RESTARTS AGE IP NODE NOMINATED NODE READINESS GATES
pod1 0/1 Pending 0 7s <none> <none> <none> <none>
[root@k8scloude1 pod]# kubectl delete pod pod1 --force
warning: Immediate deletion does not wait for confirmation that the running resource has been terminated. The resource may continue to run on the cluster indefinitely.
pod "pod1" force deleted
3.5.2 使用軟策略preferredDuringSchedulingIgnoredDuringExecution
給節點打標籤
[root@k8scloude1 pod]# kubectl label nodes k8scloude2 xx=72
node/k8scloude2 labeled
[root@k8scloude1 pod]# kubectl label nodes k8scloude3 xx=59
node/k8scloude3 labeled
創建pod,preferredDuringSchedulingIgnoredDuringExecution參數表示:節點最好具有一個鍵名為 xx
且取值大於 60
的標籤。
[root@k8scloude1 pod]# vim preferredDuringSchedule.yaml
[root@k8scloude1 pod]# cat preferredDuringSchedule.yaml
apiVersion: v1
kind: Pod
metadata:
creationTimestamp: null
labels:
run: pod1
name: pod1
namespace: pod
spec:
affinity:
nodeAffinity:
preferredDuringSchedulingIgnoredDuringExecution:
- weight: 2
preference:
matchExpressions:
- key: xx
operator: Gt
values:
- "60"
containers:
- image: nginx
imagePullPolicy: IfNotPresent
name: pod1
resources: {}
ports:
- name: http
containerPort: 80
protocol: TCP
hostPort: 80
dnsPolicy: ClusterFirst
restartPolicy: Always
status: {}
[root@k8scloude1 pod]# kubectl apply -f preferredDuringSchedule.yaml
pod/pod1 created
可以看到pod運行在k8scloude2,因為k8scloude2標籤為 xx=72,72大於60
[root@k8scloude1 pod]# kubectl get pods -o wide
NAME READY STATUS RESTARTS AGE IP NODE NOMINATED NODE READINESS GATES
pod1 1/1 Running 0 13s 10.244.112.159 k8scloude2 <none> <none>
[root@k8scloude1 pod]# kubectl delete pod pod1 --force
warning: Immediate deletion does not wait for confirmation that the running resource has been terminated. The resource may continue to run on the cluster indefinitely.
pod "pod1" force deleted
創建pod,preferredDuringSchedulingIgnoredDuringExecution參數表示:節點最好具有一個鍵名為 xx
且取值大於 600
的標籤。
[root@k8scloude1 pod]# vim preferredDuringSchedule1.yaml
[root@k8scloude1 pod]# cat preferredDuringSchedule1.yaml
apiVersion: v1
kind: Pod
metadata:
creationTimestamp: null
labels:
run: pod1
name: pod1
namespace: pod
spec:
affinity:
nodeAffinity:
preferredDuringSchedulingIgnoredDuringExecution:
- weight: 2
preference:
matchExpressions:
- key: xx
operator: Gt
values:
- "600"
containers:
- image: nginx
imagePullPolicy: IfNotPresent
name: pod1
resources: {}
ports:
- name: http
containerPort: 80
protocol: TCP
hostPort: 80
dnsPolicy: ClusterFirst
restartPolicy: Always
status: {}
[root@k8scloude1 pod]# kubectl apply -f preferredDuringSchedule1.yaml
pod/pod1 created
因為preferredDuringSchedulingIgnoredDuringExecution是軟策略,儘管k8scloude2,k8scloude3都不滿足xx>600,但是還是能成功創建pod
[root@k8scloude1 pod]# kubectl get pods -o wide
NAME READY STATUS RESTARTS AGE IP NODE NOMINATED NODE READINESS GATES
pod1 1/1 Running 0 7s 10.244.251.213 k8scloude3 <none> <none>
[root@k8scloude1 pod]# kubectl delete pod pod1 --force
warning: Immediate deletion does not wait for confirmation that the running resource has been terminated. The resource may continue to run on the cluster indefinitely.
pod "pod1" force deleted
3.5.3 節點親和性權重
你可以為 preferredDuringSchedulingIgnoredDuringExecution
親和性類型的每個實例設置 weight 欄位,其取值範圍是 1 到 100。 當調度器找到能夠滿足 Pod 的其他調度請求的節點時,調度器會遍歷節點滿足的所有的偏好性規則, 並將對應表達式的 weight 值加和。最終的加和值會添加到該節點的其他優先順序函數的評分之上。 在調度器為 Pod 作出調度決定時,總分最高的節點的優先順序也最高。
給節點打標籤
[root@k8scloude1 pod]# kubectl label nodes k8scloude2 yy=59
node/k8scloude2 labeled
[root@k8scloude1 pod]# kubectl label nodes k8scloude3 yy=72
node/k8scloude3 labeled
創建pod,preferredDuringSchedulingIgnoredDuringExecution指定了2條軟策略,但是權重不一樣:weight: 2 和 weight: 10
[root@k8scloude1 pod]# vim preferredDuringSchedule2.yaml
[root@k8scloude1 pod]# cat preferredDuringSchedule2.yaml
apiVersion: v1
kind: Pod
metadata:
creationTimestamp: null
labels:
run: pod1
name: pod1
namespace: pod
spec:
affinity:
nodeAffinity:
preferredDuringSchedulingIgnoredDuringExecution:
- weight: 2
preference:
matchExpressions:
- key: xx
operator: Gt
values:
- "60"
- weight: 10
preference:
matchExpressions:
- key: yy
operator: Gt
values:
- "60"
containers:
- image: nginx
imagePullPolicy: IfNotPresent
name: pod1
resources: {}
ports:
- name: http
containerPort: 80
protocol: TCP
hostPort: 80
dnsPolicy: ClusterFirst
restartPolicy: Always
status: {}
[root@k8scloude1 pod]# kubectl apply -f preferredDuringSchedule2.yaml
pod/pod1 created
存在兩個候選節點,因為yy>60這條規則的weight權重大,所以pod運行在k8scloude3
[root@k8scloude1 pod]# kubectl get pods -o wide
NAME READY STATUS RESTARTS AGE IP NODE NOMINATED NODE READINESS GATES
pod1 1/1 Running 0 10s 10.244.251.214 k8scloude3 <none> <none>
[root@k8scloude1 pod]# kubectl delete pod pod1 --force
warning: Immediate deletion does not wait for confirmation that the running resource has been terminated. The resource may continue to run on the cluster indefinitely.
pod "pod1" force deleted
3.6 Pod 拓撲分布約束
你可以使用 拓撲分布約束(Topology Spread Constraints) 來控制 Pod 在集群內故障域之間的分布, 故障域的示例有區域(Region)、可用區(Zone)、節點和其他用戶自定義的拓撲域。 這樣做有助於提升性能、實現高可用或提升資源利用率。