【Kubernetes 系列五】在 AWS 中使用 Kubernetes:EKS

  • 2019 年 10 月 3 日
  • 笔记

1. 概述

Amazon Elastic Kubernetes Service (Amazon EKS) 是一项托管服务,可让您在 AWS 上轻松运行 Kubernetes,而无需支持或维护您自己的 Kubernetes 控制层面。

Amazon EKS 跨多个可用区运行 Kubernetes 控制层面实例以确保高可用性。Amazon EKS 可以自动检测和替换运行状况不佳的控制层面实例,并为它们提供自动版本升级和修补。

Amazon EKS 还与许多 AWS 服务集成以便为您的应用程序提供可扩展性和安全性,包括:

  • 用于容器镜像的 Amazon ECR
  • 用于负载分配的 Elastic Load Balancing
  • 用于身份验证的 IAM
  • 用于隔离的 Amazon VPC

2. 版本

K8S 版本 K8S 发布时间 EKS 平台版本 EKS 发布日志
1.13.7 2019.6.7 eks.1 Initial release of Kubernetes 1.13 for Amazon EKS. For more information, see Kubernetes 1.13.
1.12.6 2019.2.27 eks.2 New platform version to support custom DNS names in the Kubelet certificate and improve etcd performance. This fixes a bug that caused worker node Kubelet daemons to request a new certificate every few seconds.
1.12.6 2019.2.27 eks.1 Initial release of Kubernetes 1.12 for Amazon EKS.
1.11.8 2019.3.1 eks.3 New platform version to support custom DNS names in the Kubelet certificate and improve etcd performance.
1.11.8 2019.3.1 eks.2 New platform version updating Amazon EKS Kubernetes 1.11 clusters to patch level 1.11.8 to address CVE-2019-1002100.

3. 预备

3.1. 操作环境

3.1.1 Python

  • 版本要求:>= 2.7.9
  • 用途:安装 aws cli

3.1.2 aws cli

  • 版本要求:>= 1.16.156
  • 用途:操作 aws 资源
  • 安装过程:
pip install awscli --upgrade --user

3.1.3 eksctl

  • 版本要求:>= 0.1.37
  • 用途:操作 aws eks 资源
  • 安装过程:
curl --silent --location "https://github.com/weaveworks/eksctl/releases/download/latest_release/eksctl_$(uname -s)_amd64.tar.gz" | tar xz -C /tmp    sudo mv /tmp/eksctl /usr/local/bin    eksctl version

3.1.4 kubectl

  • 版本要求:最新版本或不低于 Kubernetes 版本 1 个小版本号。
  • 用途:操作 Kubernetes 集群
  • 安装过程:
curl -LO https://storage.googleapis.com/kubernetes-release/release/`curl -s https://storage.googleapis.com/kubernetes-release/release/stable.txt`/bin/linux/amd64/kubectl  chmod +x ./kubectl  sudo mv ./kubectl /usr/local/bin/kubectl  kubectl version

3.2. 角色权限

参考:

  1. Amazon EKS 基于身份的策略示例
  2. https://github.com/weaveworks/eksctl/issues/204#issuecomment-450280786(这位小哥说他亲自试了 30 多次才补全的,而我试了将近 40 次)
  3. https://docs.aws.amazon.com/autoscaling/ec2/userguide/control-access-using-iam.html

注意:要有适量网关、VPC 和 IP 数量空余,否则会报达到最大限制错误。

3.2.1. CloudFormation 完全权限

{    "Version": "2012-10-17",    "Statement": [      {        "Effect": "Allow",        "Action": [          "cloudformation:*"        ],        "Resource": "*"      }    ]  }

3.2.2. EKS 读写权限

{    "Version": "2012-10-17",    "Statement": [      {        "Sid": "VisualEditor0",        "Effect": "Allow",        "Action": [          "eks:ListClusters",          "eks:CreateCluster"        ],        "Resource": "*"      },      {        "Sid": "VisualEditor1",        "Effect": "Allow",        "Action": [          "eks:UpdateClusterVersion",          "eks:ListUpdates",          "eks:DescribeUpdate",          "eks:DescribeCluster",          "eks:ListClusters",          "eks:CreateCluster"        ],        "Resource": "arn:aws:eks:*:*:cluster/*"      }    ]  }

3.2.3. EC2 相关权限

{    "Version": "2012-10-17",    "Statement": [      {        "Sid": "VisualEditor0",        "Effect": "Allow",        "Action": [          "ec2:CreateInternetGateway",          "ec2:CreateVpc",          "ec2:Describe*",          "ec2:createTags",          "ec2:ModifyVpcAttribute",          "ec2:CreateSubnet",          "ec2:CreateSubnet",          "ec2:CreateRouteTable",          "ec2:CreateSecurityGroup",          "ec2:DeleteSecurityGroup",          "ec2:AttachInternetGateway",          "ec2:CreateRoute",          "ec2:AuthorizeSecurityGroupIngress",          "ec2:AuthorizeSecurityGroupEgress",          "ec2:RevokeSecurityGroupEgress",          "ec2:RevokeSecurityGroupIngress",          "ec2:AssociateRouteTable",          "ec2:CreateNatGateway",          "ec2:AllocateAddress",          "ec2:DeleteInternetGateway",          "ec2:DeleteNatGateway",          "ec2:DeleteRoute",          "ec2:DeleteRouteTable",          "ec2:DeleteSubnet",          "ec2:DeleteTags",          "ec2:DeleteVpc",          "ec2:DescribeInternetGateways",          "ec2:DescribeNatGateways",          "ec2:DescribeRouteTables",          "ec2:DescribeSecurityGroups",          "ec2:DescribeSubnets",          "ec2:DescribeTags",          "ec2:DescribeVpcAttribute",          "ec2:DetachInternetGateway",          "ec2:DisassociateRouteTable",          "ec2:RunInstances",          "ec2:ReleaseAddress"        ],        "Resource": "*"      }    ]  }

3.2.4. CloudWatch 相关权限

{    "Version": "2012-10-17",    "Statement": [      {        "Effect": "Allow",        "Action": [          "cloudwatch:ListMetrics",          "cloudwatch:GetMetricStatistics",          "cloudwatch:Describe*"        ],        "Resource": "*"      },    ]  }

3.2.5. autoscaling 相关权限

{    "Version": "2012-10-17",    "Statement": [        {            "Effect": "Allow",            "Action": [                  "autoscaling:CreateAutoScalingGroup",                  "autoscaling:DeleteAutoScalingGroup",                  "autoscaling:DeleteLaunchConfiguration",                  "autoscaling:DescribeAutoScalingGroups",                  "autoscaling:DescribeLaunchConfigurations",                  "autoscaling:DescribeScalingActivities",                  "autoscaling:UpdateAutoScalingGroup"              ],            "Resource": "*"        }    ]  }

3.2.6. elasticloadbalancing 相关权限

{    "Version": "2012-10-17",    "Statement": [      {        "Effect": "Allow",        "Action": "elasticloadbalancing:Describe*",        "Resource": "*"      }    ]  }

3.2.7. iam 相关权限

{    "Version": "2012-10-17",    "Statement": [      {        "Sid": "VisualEditor0",        "Effect": "Allow",        "Action": [          "iam:CreateRole",          "iam:AttachRolePolicy",          "iam:DetachRolePolicy",          "iam:GetRole",          "iam:PassRole",          "iam:CreateInstanceProfile",          "iam:AddRoleToInstanceProfile",          "iam:RemoveRoleFromInstanceProfile",          "iam:GetInstanceProfile",          "iam:PutRolePolicy",          "iam:DeleteRolePolicy",          "iam:GetRolePolicy",          "iam:ListInstanceProfiles",          "iam:CreateServiceLinkedRole",          "iam:ListInstanceProfilesForRole"        ],        "Resource": "*"      }    ]  }

3.2.8. LaunchTemplate 相关权限

{    "Sid": "VisualEditor2",    "Effect": "Allow",    "Action": [      "autoscaling:CreateLaunchConfiguration",      "ec2:DeleteLaunchTemplate",      "ec2:ModifyLaunchTemplate",      "ec2:DeleteLaunchTemplateVersions",      "ec2:CreateLaunchTemplateVersion"    ],    "Resource": [      "arn:aws:autoscaling:*:*:launchConfiguration:*:launchConfigurationName/*",      "arn:aws:ec2:*:*:launch-template/*"    ]  }

3.3. 安装 aws-iam-authenticator

参见:https://docs.aws.amazon.com/zh_cn/eks/latest/userguide/install-aws-iam-authenticator.html

curl -o aws-iam-authenticator https://amazon-eks.s3-us-west-2.amazonaws.com/1.13.7/2019-06-11/bin/linux/amd64/aws-iam-authenticator  chmod +x ./aws-iam-authenticator  mkdir -p $HOME/bin && cp ./aws-iam-authenticator $HOME/bin/aws-iam-authenticator && export PATH=$HOME/bin:$PATH  echo 'export PATH=$HOME/bin:$PATH' >> ~/.bashrc  // 获取 token?  aws-iam-authenticator token -i <cluster name>  // 查看调用者?  aws sts get-caller-identity

3.4. 创建 kubeconfig

参见:https://docs.aws.amazon.com/zh_cn/eks/latest/userguide/create-kubeconfig.html

使用以下命令自动生成 kubeconfig

// 生成 kubeconfig  aws eks --region <your region> update-kubeconfig --name <cluster name>  // 查看 kubeconfig  cat ~/.kube/config

4. 开始使用

4.1. 创建集群

使用以下命令开始创建集群,其原理是:通过 aws cli 调用 CloudFormation 的相关 API,启动一个创建 EKS Cluster 的 Stack 和一个创建 EKS nodes 的 Stack 去创建集群所需的各种资源(包括网关、IP、VPC、EC2 等等)。

eksctl create cluster   --name prod   --version 1.13   --nodegroup-name standard-workers   --node-type t3.medium   --nodes 3   --nodes-min 1   --nodes-max 4   --node-ami auto

注意:如果选择 P2 或 P3 实例类型和 Amazon EKS 优化的 AMI(具有 GPU 支持),则必须使用以下命令在集群上将适用于 Kubernetes 的 NVIDIA 设备插件用作守护程序集。

kubectl apply -f https://raw.githubusercontent.com/NVIDIA/k8s-device-plugin/1.0.0-beta/nvidia-device-plugin.yml

4.2. 查看集群状态

// 查看节点状态  kubectl get nodes  // 查看服务状态  kubectl get svc  // 查看事件  kubectl get events --all-namespaces

4.3. 部署 Dashboard

参见:

  1. https://aws.amazon.com/cn/premiumsupport/knowledge-center/eks-cluster-kubernetes-dashboard/
  2. https://docs.aws.amazon.com/zh_cn/eks/latest/userguide/dashboard-tutorial.html
  3. https://www.youtube.com/watch?v=JcZJqSa65Yc
// 将 Kubernetes 控制面板部署到集群  kubectl apply -f https://raw.githubusercontent.com/kubernetes/dashboard/v1.10.1/src/deploy/recommended/kubernetes-dashboard.yaml  // 部署 heapster 以在集群上启用容器集群监控和性能分析  kubectl apply -f https://raw.githubusercontent.com/kubernetes/heapster/master/deploy/kube-config/influxdb/heapster.yaml  // 将 heapster 的 influxdb 后端部署到集群  kubectl apply -f https://raw.githubusercontent.com/kubernetes/heapster/master/deploy/kube-config/influxdb/influxdb.yaml  // 为控制面板创建 heapster 集群角色绑定  kubectl apply -f https://raw.githubusercontent.com/kubernetes/heapster/master/deploy/kube-config/rbac/heapster-rbac.yaml  // 创建一个具有新集群管理权限的新服务账户  cat > eks-admin-service-account.yaml << EOF  apiVersion: v1  kind: ServiceAccount  metadata:    name: eks-admin    namespace: kube-system  ---  apiVersion: rbac.authorization.k8s.io/v1beta1  kind: ClusterRoleBinding  metadata:    name: eks-admin  roleRef:    apiGroup: rbac.authorization.k8s.io    kind: ClusterRole    name: cluster-admin  subjects:  - kind: ServiceAccount    name: eks-admin    namespace: kube-system  EOF  // 将此服务账户和集群角色绑定应用到您的集群  kubectl apply -f eks-admin-service-account.yaml  // 检索 eks-admin 服务账户的身份验证令牌。从输出中复制 <authentication_token> 值。您可以使用此令牌连接到控制面板  kubectl -n kube-system describe secret $(kubectl -n kube-system get secret | grep eks-admin | awk '{print $1}')      // 将所有请求从您的 Amazon EC2 实例本地主机端口转发到 Kubernetes 控制面板端口  kubectl port-forward svc/kubernetes-dashboard -n kube-system 6443:443  // 从带 SSH 隧道的本地计算机访问端口  ssh -i EC2KeyPair.pem ec2-user@IP -L 6443:127.0.0.1:6443

访问 https://127.0.0.1:6443 输入 Token 即可访问 Dashboard。

4.4. 删除集群

eksctl delete cluster --region=<your region> --name=<cluster name>

4.5. 更多操作

参见:

  • https://kubernetes.io/docs/tutorials/