【Kubernetes 系列五】在 AWS 中使用 Kubernetes:EKS

  • 2019 年 10 月 3 日
  • 筆記

1. 概述

Amazon Elastic Kubernetes Service (Amazon EKS) 是一項託管服務,可讓您在 AWS 上輕鬆運行 Kubernetes,而無需支援或維護您自己的 Kubernetes 控制層面。

Amazon EKS 跨多個可用區運行 Kubernetes 控制層面實例以確保高可用性。Amazon EKS 可以自動檢測和替換運行狀況不佳的控制層面實例,並為它們提供自動版本升級和修補。

Amazon EKS 還與許多 AWS 服務集成以便為您的應用程式提供可擴展性和安全性,包括:

  • 用於容器鏡像的 Amazon ECR
  • 用於負載分配的 Elastic Load Balancing
  • 用於身份驗證的 IAM
  • 用於隔離的 Amazon VPC

2. 版本

K8S 版本 K8S 發布時間 EKS 平台版本 EKS 發布日誌
1.13.7 2019.6.7 eks.1 Initial release of Kubernetes 1.13 for Amazon EKS. For more information, see Kubernetes 1.13.
1.12.6 2019.2.27 eks.2 New platform version to support custom DNS names in the Kubelet certificate and improve etcd performance. This fixes a bug that caused worker node Kubelet daemons to request a new certificate every few seconds.
1.12.6 2019.2.27 eks.1 Initial release of Kubernetes 1.12 for Amazon EKS.
1.11.8 2019.3.1 eks.3 New platform version to support custom DNS names in the Kubelet certificate and improve etcd performance.
1.11.8 2019.3.1 eks.2 New platform version updating Amazon EKS Kubernetes 1.11 clusters to patch level 1.11.8 to address CVE-2019-1002100.

3. 預備

3.1. 操作環境

3.1.1 Python

  • 版本要求:>= 2.7.9
  • 用途:安裝 aws cli

3.1.2 aws cli

  • 版本要求:>= 1.16.156
  • 用途:操作 aws 資源
  • 安裝過程:
pip install awscli --upgrade --user

3.1.3 eksctl

  • 版本要求:>= 0.1.37
  • 用途:操作 aws eks 資源
  • 安裝過程:
curl --silent --location "https://github.com/weaveworks/eksctl/releases/download/latest_release/eksctl_$(uname -s)_amd64.tar.gz" | tar xz -C /tmp    sudo mv /tmp/eksctl /usr/local/bin    eksctl version

3.1.4 kubectl

  • 版本要求:最新版本或不低於 Kubernetes 版本 1 個小版本號。
  • 用途:操作 Kubernetes 集群
  • 安裝過程:
curl -LO https://storage.googleapis.com/kubernetes-release/release/`curl -s https://storage.googleapis.com/kubernetes-release/release/stable.txt`/bin/linux/amd64/kubectl  chmod +x ./kubectl  sudo mv ./kubectl /usr/local/bin/kubectl  kubectl version

3.2. 角色許可權

參考:

  1. Amazon EKS 基於身份的策略示例
  2. https://github.com/weaveworks/eksctl/issues/204#issuecomment-450280786(這位小哥說他親自試了 30 多次才補全的,而我試了將近 40 次)
  3. https://docs.aws.amazon.com/autoscaling/ec2/userguide/control-access-using-iam.html

注意:要有適量網關、VPC 和 IP 數量空餘,否則會報達到最大限制錯誤。

3.2.1. CloudFormation 完全許可權

{    "Version": "2012-10-17",    "Statement": [      {        "Effect": "Allow",        "Action": [          "cloudformation:*"        ],        "Resource": "*"      }    ]  }

3.2.2. EKS 讀寫許可權

{    "Version": "2012-10-17",    "Statement": [      {        "Sid": "VisualEditor0",        "Effect": "Allow",        "Action": [          "eks:ListClusters",          "eks:CreateCluster"        ],        "Resource": "*"      },      {        "Sid": "VisualEditor1",        "Effect": "Allow",        "Action": [          "eks:UpdateClusterVersion",          "eks:ListUpdates",          "eks:DescribeUpdate",          "eks:DescribeCluster",          "eks:ListClusters",          "eks:CreateCluster"        ],        "Resource": "arn:aws:eks:*:*:cluster/*"      }    ]  }

3.2.3. EC2 相關許可權

{    "Version": "2012-10-17",    "Statement": [      {        "Sid": "VisualEditor0",        "Effect": "Allow",        "Action": [          "ec2:CreateInternetGateway",          "ec2:CreateVpc",          "ec2:Describe*",          "ec2:createTags",          "ec2:ModifyVpcAttribute",          "ec2:CreateSubnet",          "ec2:CreateSubnet",          "ec2:CreateRouteTable",          "ec2:CreateSecurityGroup",          "ec2:DeleteSecurityGroup",          "ec2:AttachInternetGateway",          "ec2:CreateRoute",          "ec2:AuthorizeSecurityGroupIngress",          "ec2:AuthorizeSecurityGroupEgress",          "ec2:RevokeSecurityGroupEgress",          "ec2:RevokeSecurityGroupIngress",          "ec2:AssociateRouteTable",          "ec2:CreateNatGateway",          "ec2:AllocateAddress",          "ec2:DeleteInternetGateway",          "ec2:DeleteNatGateway",          "ec2:DeleteRoute",          "ec2:DeleteRouteTable",          "ec2:DeleteSubnet",          "ec2:DeleteTags",          "ec2:DeleteVpc",          "ec2:DescribeInternetGateways",          "ec2:DescribeNatGateways",          "ec2:DescribeRouteTables",          "ec2:DescribeSecurityGroups",          "ec2:DescribeSubnets",          "ec2:DescribeTags",          "ec2:DescribeVpcAttribute",          "ec2:DetachInternetGateway",          "ec2:DisassociateRouteTable",          "ec2:RunInstances",          "ec2:ReleaseAddress"        ],        "Resource": "*"      }    ]  }

3.2.4. CloudWatch 相關許可權

{    "Version": "2012-10-17",    "Statement": [      {        "Effect": "Allow",        "Action": [          "cloudwatch:ListMetrics",          "cloudwatch:GetMetricStatistics",          "cloudwatch:Describe*"        ],        "Resource": "*"      },    ]  }

3.2.5. autoscaling 相關許可權

{    "Version": "2012-10-17",    "Statement": [        {            "Effect": "Allow",            "Action": [                  "autoscaling:CreateAutoScalingGroup",                  "autoscaling:DeleteAutoScalingGroup",                  "autoscaling:DeleteLaunchConfiguration",                  "autoscaling:DescribeAutoScalingGroups",                  "autoscaling:DescribeLaunchConfigurations",                  "autoscaling:DescribeScalingActivities",                  "autoscaling:UpdateAutoScalingGroup"              ],            "Resource": "*"        }    ]  }

3.2.6. elasticloadbalancing 相關許可權

{    "Version": "2012-10-17",    "Statement": [      {        "Effect": "Allow",        "Action": "elasticloadbalancing:Describe*",        "Resource": "*"      }    ]  }

3.2.7. iam 相關許可權

{    "Version": "2012-10-17",    "Statement": [      {        "Sid": "VisualEditor0",        "Effect": "Allow",        "Action": [          "iam:CreateRole",          "iam:AttachRolePolicy",          "iam:DetachRolePolicy",          "iam:GetRole",          "iam:PassRole",          "iam:CreateInstanceProfile",          "iam:AddRoleToInstanceProfile",          "iam:RemoveRoleFromInstanceProfile",          "iam:GetInstanceProfile",          "iam:PutRolePolicy",          "iam:DeleteRolePolicy",          "iam:GetRolePolicy",          "iam:ListInstanceProfiles",          "iam:CreateServiceLinkedRole",          "iam:ListInstanceProfilesForRole"        ],        "Resource": "*"      }    ]  }

3.2.8. LaunchTemplate 相關許可權

{    "Sid": "VisualEditor2",    "Effect": "Allow",    "Action": [      "autoscaling:CreateLaunchConfiguration",      "ec2:DeleteLaunchTemplate",      "ec2:ModifyLaunchTemplate",      "ec2:DeleteLaunchTemplateVersions",      "ec2:CreateLaunchTemplateVersion"    ],    "Resource": [      "arn:aws:autoscaling:*:*:launchConfiguration:*:launchConfigurationName/*",      "arn:aws:ec2:*:*:launch-template/*"    ]  }

3.3. 安裝 aws-iam-authenticator

參見:https://docs.aws.amazon.com/zh_cn/eks/latest/userguide/install-aws-iam-authenticator.html

curl -o aws-iam-authenticator https://amazon-eks.s3-us-west-2.amazonaws.com/1.13.7/2019-06-11/bin/linux/amd64/aws-iam-authenticator  chmod +x ./aws-iam-authenticator  mkdir -p $HOME/bin && cp ./aws-iam-authenticator $HOME/bin/aws-iam-authenticator && export PATH=$HOME/bin:$PATH  echo 'export PATH=$HOME/bin:$PATH' >> ~/.bashrc  // 獲取 token?  aws-iam-authenticator token -i <cluster name>  // 查看調用者?  aws sts get-caller-identity

3.4. 創建 kubeconfig

參見:https://docs.aws.amazon.com/zh_cn/eks/latest/userguide/create-kubeconfig.html

使用以下命令自動生成 kubeconfig

// 生成 kubeconfig  aws eks --region <your region> update-kubeconfig --name <cluster name>  // 查看 kubeconfig  cat ~/.kube/config

4. 開始使用

4.1. 創建集群

使用以下命令開始創建集群,其原理是:通過 aws cli 調用 CloudFormation 的相關 API,啟動一個創建 EKS Cluster 的 Stack 和一個創建 EKS nodes 的 Stack 去創建集群所需的各種資源(包括網關、IP、VPC、EC2 等等)。

eksctl create cluster   --name prod   --version 1.13   --nodegroup-name standard-workers   --node-type t3.medium   --nodes 3   --nodes-min 1   --nodes-max 4   --node-ami auto

注意:如果選擇 P2 或 P3 實例類型和 Amazon EKS 優化的 AMI(具有 GPU 支援),則必須使用以下命令在集群上將適用於 Kubernetes 的 NVIDIA 設備插件用作守護程式集。

kubectl apply -f https://raw.githubusercontent.com/NVIDIA/k8s-device-plugin/1.0.0-beta/nvidia-device-plugin.yml

4.2. 查看集群狀態

// 查看節點狀態  kubectl get nodes  // 查看服務狀態  kubectl get svc  // 查看事件  kubectl get events --all-namespaces

4.3. 部署 Dashboard

參見:

  1. https://aws.amazon.com/cn/premiumsupport/knowledge-center/eks-cluster-kubernetes-dashboard/
  2. https://docs.aws.amazon.com/zh_cn/eks/latest/userguide/dashboard-tutorial.html
  3. https://www.youtube.com/watch?v=JcZJqSa65Yc
// 將 Kubernetes 控制面板部署到集群  kubectl apply -f https://raw.githubusercontent.com/kubernetes/dashboard/v1.10.1/src/deploy/recommended/kubernetes-dashboard.yaml  // 部署 heapster 以在集群上啟用容器集群監控和性能分析  kubectl apply -f https://raw.githubusercontent.com/kubernetes/heapster/master/deploy/kube-config/influxdb/heapster.yaml  // 將 heapster 的 influxdb 後端部署到集群  kubectl apply -f https://raw.githubusercontent.com/kubernetes/heapster/master/deploy/kube-config/influxdb/influxdb.yaml  // 為控制面板創建 heapster 集群角色綁定  kubectl apply -f https://raw.githubusercontent.com/kubernetes/heapster/master/deploy/kube-config/rbac/heapster-rbac.yaml  // 創建一個具有新集群管理許可權的新服務賬戶  cat > eks-admin-service-account.yaml << EOF  apiVersion: v1  kind: ServiceAccount  metadata:    name: eks-admin    namespace: kube-system  ---  apiVersion: rbac.authorization.k8s.io/v1beta1  kind: ClusterRoleBinding  metadata:    name: eks-admin  roleRef:    apiGroup: rbac.authorization.k8s.io    kind: ClusterRole    name: cluster-admin  subjects:  - kind: ServiceAccount    name: eks-admin    namespace: kube-system  EOF  // 將此服務賬戶和集群角色綁定應用到您的集群  kubectl apply -f eks-admin-service-account.yaml  // 檢索 eks-admin 服務賬戶的身份驗證令牌。從輸出中複製 <authentication_token> 值。您可以使用此令牌連接到控制面板  kubectl -n kube-system describe secret $(kubectl -n kube-system get secret | grep eks-admin | awk '{print $1}')      // 將所有請求從您的 Amazon EC2 實例本地主機埠轉發到 Kubernetes 控制面板埠  kubectl port-forward svc/kubernetes-dashboard -n kube-system 6443:443  // 從帶 SSH 隧道的本地電腦訪問埠  ssh -i EC2KeyPair.pem ec2-user@IP -L 6443:127.0.0.1:6443

訪問 https://127.0.0.1:6443 輸入 Token 即可訪問 Dashboard。

4.4. 刪除集群

eksctl delete cluster --region=<your region> --name=<cluster name>

4.5. 更多操作

參見:

  • https://kubernetes.io/docs/tutorials/