k8s与HPA--通过 Prometheus adaptor 来自定义监控指标
自动扩展是一种根据资源使用情况自动扩展或缩小工作负载的方法。 Kubernetes中的自动缩放有两个维度:Cluster Autoscaler处理节点扩展操作,Horizontal Pod Autoscaler自动扩展部署或副本集中的pod数量。 Cluster Autoscaling与Horizontal Pod Autoscaler一起用于动态调整计算能力以及系统满足SLA所需的并行度。虽然Cluster Autoscaler高度依赖托管您的集群的云提供商的基础功能,但HPA可以独立于您的IaaS / PaaS提供商运营。
Horizontal Pod Autoscaler功能最初是在Kubernetes v1.1中引入的,并且从那时起已经发展了很多。 HPA缩放容器的版本1基于观察到的CPU利用率,后来基于内存使用情况。在Kubernetes 1.6中,引入了一个新的API Custom Metrics API,使HPA能够访问任意指标。 Kubernetes 1.7引入了聚合层,允许第三方应用程序通过将自己注册为API附加组件来扩展Kubernetes API。 Custom Metrics API和聚合层使Prometheus等监控系统可以向HPA控制器公开特定于应用程序的指标。
Horizontal Pod Autoscaler实现为一个控制循环,定期查询Resource Metrics API以获取CPU /内存等核心指标和针对特定应用程序指标的Custom Metrics API。
以下是为Kubernetes 1.9或更高版本配置HPA v2的分步指南。您将安装提供核心指标的Metrics Server附加组件,然后您将使用演示应用程序根据CPU和内存使用情况展示pod自动扩展。在本指南的第二部分中,您将部署Prometheus和自定义API服务器。您将使用聚合器层注册自定义API服务器,然后使用演示应用程序提供的自定义指标配置HPA。
在开始之前,您需要安装Go 1.8或更高版本并在GOPATH中克隆k8s-prom-hpa repo。
cd $GOPATHgit clone https://github.com/stefanprodan/k8s-prom-hpa
部署 Metrics Server
kubernetes Metrics Server是资源使用数据的集群范围聚合器,是Heapster的后继者。度量服务器通过汇集来自kubernetes.summary_api的数据来收集节点和pod的CPU和内存使用情况。摘要API是一种内存高效的API,用于将数据从Kubelet / cAdvisor传递到度量服务器。
在HPA的第一个版本中,您需要Heapster来提供CPU和内存指标,在HPA v2和Kubernetes 1.8中,只有在启用horizontal-pod-autoscaler-use-rest-clients时才需要指标服务器。默认情况下,Kubernetes 1.9中启用了HPA rest客户端。 GKE 1.9附带预安装的Metrics Server。
在kube-system命名空间中部署Metrics Server:
kubectl create -f ./metrics-server
一分钟后,度量服务器开始报告节点和pod的CPU和内存使用情况。
查看nodes metrics:
kubectl get --raw "/apis/metrics.k8s.io/v1beta1/nodes" | jq .
结果如下:
{ "kind": "NodeMetricsList", "apiVersion": "metrics.k8s.io/v1beta1", "metadata": { "selfLink": "/apis/metrics.k8s.io/v1beta1/nodes" }, "items": [ { "metadata": { "name": "ip-10-1-50-61.ec2.internal", "selfLink": "/apis/metrics.k8s.io/v1beta1/nodes/ip-10-1-50-61.ec2.internal", "creationTimestamp": "2019-02-13T08:34:05Z" }, "timestamp": "2019-02-13T08:33:38Z", "window": "30s", "usage": { "cpu": "78322168n", "memory": "563180Ki" } }, { "metadata": { "name": "ip-10-1-57-40.ec2.internal", "selfLink": "/apis/metrics.k8s.io/v1beta1/nodes/ip-10-1-57-40.ec2.internal", "creationTimestamp": "2019-02-13T08:34:05Z" }, "timestamp": "2019-02-13T08:33:42Z", "window": "30s", "usage": { "cpu": "48926263n", "memory": "554472Ki" } }, { "metadata": { "name": "ip-10-1-62-29.ec2.internal", "selfLink": "/apis/metrics.k8s.io/v1beta1/nodes/ip-10-1-62-29.ec2.internal", "creationTimestamp": "2019-02-13T08:34:05Z" }, "timestamp": "2019-02-13T08:33:36Z", "window": "30s", "usage": { "cpu": "36700681n", "memory": "326088Ki" } } ]}
查看pods metrics:
kubectl get --raw "/apis/metrics.k8s.io/v1beta1/pods" | jq .
结果如下:
{ "kind": "PodMetricsList", "apiVersion": "metrics.k8s.io/v1beta1", "metadata": { "selfLink": "/apis/metrics.k8s.io/v1beta1/pods" }, "items": [ { "metadata": { "name": "kube-proxy-77nt2", "namespace": "kube-system", "selfLink": "/apis/metrics.k8s.io/v1beta1/namespaces/kube-system/pods/kube-proxy-77nt2", "creationTimestamp": "2019-02-13T08:35:19Z" }, "timestamp": "2019-02-13T08:35:00Z", "window": "30s", "containers": [ { "name": "kube-proxy", "usage": { "cpu": "2370555n", "memory": "13184Ki" } } ] }, { "metadata": { "name": "cluster-autoscaler-n2xsl", "namespace": "kube-system", "selfLink": "/apis/metrics.k8s.io/v1beta1/namespaces/kube-system/pods/cluster-autoscaler-n2xsl", "creationTimestamp": "2019-02-13T08:35:19Z" }, "timestamp": "2019-02-13T08:35:12Z", "window": "30s", "containers": [ { "name": "cluster-autoscaler", "usage": { "cpu": "1477997n", "memory": "54584Ki" } } ] }, { "metadata": { "name": "core-dns-autoscaler-b4785d4d7-j64xd", "namespace": "kube-system", "selfLink": "/apis/metrics.k8s.io/v1beta1/namespaces/kube-system/pods/core-dns-autoscaler-b4785d4d7-j64xd", "creationTimestamp": "2019-02-13T08:35:19Z" }, "timestamp": "2019-02-13T08:35:08Z", "window": "30s", "containers": [ { "name": "autoscaler", "usage": { "cpu": "191293n", "memory": "7956Ki" } } ] }, { "metadata": { "name": "spot-interrupt-handler-8t2xk", "namespace": "kube-system", "selfLink": "/apis/metrics.k8s.io/v1beta1/namespaces/kube-system/pods/spot-interrupt-handler-8t2xk", "creationTimestamp": "2019-02-13T08:35:19Z" }, "timestamp": "2019-02-13T08:35:04Z", "window": "30s", "containers": [ { "name": "spot-interrupt-handler", "usage": { "cpu": "844907n", "memory": "4608Ki" } } ] }, { "metadata": { "name": "kube-proxy-t5kqm", "namespace": "kube-system", "selfLink": "/apis/metrics.k8s.io/v1beta1/namespaces/kube-system/pods/kube-proxy-t5kqm", "creationTimestamp": "2019-02-13T08:35:19Z" }, "timestamp": "2019-02-13T08:35:08Z", "window": "30s", "containers": [ { "name": "kube-proxy", "usage": { "cpu": "1194766n", "memory": "12204Ki" } } ] }, { "metadata": { "name": "kube-proxy-zxmqb", "namespace": "kube-system", "selfLink": "/apis/metrics.k8s.io/v1beta1/namespaces/kube-system/pods/kube-proxy-zxmqb", "creationTimestamp": "2019-02-13T08:35:19Z" }, "timestamp": "2019-02-13T08:35:06Z", "window": "30s", "containers": [ { "name": "kube-proxy", "usage": { "cpu": "3021117n", "memory": "13628Ki" } } ] }, { "metadata": { "name": "aws-node-rcz5c", "namespace": "kube-system", "selfLink": "/apis/metrics.k8s.io/v1beta1/namespaces/kube-system/pods/aws-node-rcz5c", "creationTimestamp": "2019-02-13T08:35:19Z" }, "timestamp": "2019-02-13T08:35:15Z", "window": "30s", "containers": [ { "name": "aws-node", "usage": { "cpu": "1217989n", "memory": "24976Ki" } } ] }, { "metadata": { "name": "aws-node-z2qxs", "namespace": "kube-system", "selfLink": "/apis/metrics.k8s.io/v1beta1/namespaces/kube-system/pods/aws-node-z2qxs", "creationTimestamp": "2019-02-13T08:35:19Z" }, "timestamp": "2019-02-13T08:35:15Z", "window": "30s", "containers": [ { "name": "aws-node", "usage": { "cpu": "1025780n", "memory": "46424Ki" } } ] }, { "metadata": { "name": "php-apache-899d75b96-8ppk4", "namespace": "default", "selfLink": "/apis/metrics.k8s.io/v1beta1/namespaces/default/pods/php-apache-899d75b96-8ppk4", "creationTimestamp": "2019-02-13T08:35:19Z" }, "timestamp": "2019-02-13T08:35:08Z", "window": "30s", "containers": [ { "name": "php-apache", "usage": { "cpu": "24612n", "memory": "27556Ki" } } ] }, { "metadata": { "name": "load-generator-779c5f458c-9sglg", "namespace": "default", "selfLink": "/apis/metrics.k8s.io/v1beta1/namespaces/default/pods/load-generator-779c5f458c-9sglg", "creationTimestamp": "2019-02-13T08:35:19Z" }, "timestamp": "2019-02-13T08:34:56Z", "window": "30s", "containers": [ { "name": "load-generator", "usage": { "cpu": "0", "memory": "336Ki" } } ] }, { "metadata": { "name": "aws-node-v9jxs", "namespace": "kube-system", "selfLink": "/apis/metrics.k8s.io/v1beta1/namespaces/kube-system/pods/aws-node-v9jxs", "creationTimestamp": "2019-02-13T08:35:19Z" }, "timestamp": "2019-02-13T08:35:00Z", "window": "30s", "containers": [ { "name": "aws-node", "usage": { "cpu": "1303458n", "memory": "28020Ki" } } ] }, { "metadata": { "name": "kube2iam-m2ktt", "namespace": "kube-system", "selfLink": "/apis/metrics.k8s.io/v1beta1/namespaces/kube-system/pods/kube2iam-m2ktt", "creationTimestamp": "2019-02-13T08:35:19Z" }, "timestamp": "2019-02-13T08:35:11Z", "window": "30s", "containers": [ { "name": "kube2iam", "usage": { "cpu": "1328864n", "memory": "9724Ki" } } ] }, { "metadata": { "name": "kube2iam-w9cqf", "namespace": "kube-system", "selfLink": "/apis/metrics.k8s.io/v1beta1/namespaces/kube-system/pods/kube2iam-w9cqf", "creationTimestamp": "2019-02-13T08:35:19Z" }, "timestamp": "2019-02-13T08:35:03Z", "window": "30s", "containers": [ { "name": "kube2iam", "usage": { "cpu": "1294379n", "memory": "8812Ki" } } ] }, { "metadata": { "name": "custom-metrics-apiserver-657644489c-pk8rb", "namespace": "monitoring", "selfLink": "/apis/metrics.k8s.io/v1beta1/namespaces/monitoring/pods/custom-metrics-apiserver-657644489c-pk8rb", "creationTimestamp": "2019-02-13T08:35:19Z" }, "timestamp": "2019-02-13T08:35:04Z", "window": "30s", "containers": [ { "name": "custom-metrics-apiserver", "usage": { "cpu": "22409370n", "memory": "42468Ki" } } ] }, { "metadata": { "name": "kube2iam-qghgt", "namespace": "kube-system", "selfLink": "/apis/metrics.k8s.io/v1beta1/namespaces/kube-system/pods/kube2iam-qghgt", "creationTimestamp": "2019-02-13T08:35:19Z" }, "timestamp": "2019-02-13T08:35:11Z", "window": "30s", "containers": [ { "name": "kube2iam", "usage": { "cpu": "2078992n", "memory": "16356Ki" } } ] }, { "metadata": { "name": "spot-interrupt-handler-ps745", "namespace": "kube-system", "selfLink": "/apis/metrics.k8s.io/v1beta1/namespaces/kube-system/pods/spot-interrupt-handler-ps745", "creationTimestamp": "2019-02-13T08:35:19Z" }, "timestamp": "2019-02-13T08:35:10Z", "window": "30s", "containers": [ { "name": "spot-interrupt-handler", "usage": { "cpu": "611566n", "memory": "4336Ki" } } ] }, { "metadata": { "name": "coredns-68fb7946fb-2xnpp", "namespace": "kube-system", "selfLink": "/apis/metrics.k8s.io/v1beta1/namespaces/kube-system/pods/coredns-68fb7946fb-2xnpp", "creationTimestamp": "2019-02-13T08:35:19Z" }, "timestamp": "2019-02-13T08:35:12Z", "window": "30s", "containers": [ { "name": "coredns", "usage": { "cpu": "1610381n", "memory": "10480Ki" } } ] }, { "metadata": { "name": "coredns-68fb7946fb-9ctjf", "namespace": "kube-system", "selfLink": "/apis/metrics.k8s.io/v1beta1/namespaces/kube-system/pods/coredns-68fb7946fb-9ctjf", "creationTimestamp": "2019-02-13T08:35:19Z" }, "timestamp": "2019-02-13T08:35:13Z", "window": "30s", "containers": [ { "name": "coredns", "usage": { "cpu": "1418850n", "memory": "9852Ki" } } ] }, { "metadata": { "name": "prometheus-7d4f6d4454-v4fnd", "namespace": "monitoring", "selfLink": "/apis/metrics.k8s.io/v1beta1/namespaces/monitoring/pods/prometheus-7d4f6d4454-v4fnd", "creationTimestamp": "2019-02-13T08:35:19Z" }, "timestamp": "2019-02-13T08:35:00Z", "window": "30s", "containers": [ { "name": "prometheus", "usage": { "cpu": "17951807n", "memory": "202316Ki" } } ] }, { "metadata": { "name": "metrics-server-7cdd54ccb4-k2x7m", "namespace": "kube-system", "selfLink": "/apis/metrics.k8s.io/v1beta1/namespaces/kube-system/pods/metrics-server-7cdd54ccb4-k2x7m", "creationTimestamp": "2019-02-13T08:35:19Z" }, "timestamp": "2019-02-13T08:35:04Z", "window": "30s", "containers": [ { "name": "metrics-server-nanny", "usage": { "cpu": "144656n", "memory": "5716Ki" } }, { "name": "metrics-server", "usage": { "cpu": "568327n", "memory": "16268Ki" } } ] } ]}
基于CPU和内存使用情况的Auto Scaling
您将使用基于Golang的小型Web应用程序来测试Horizontal Pod Autoscaler(HPA)。
将podinfo部署到默认命名空间:
kubectl create -f ./podinfo/podinfo-svc.yaml,./podinfo/podinfo-dep.yaml
使用NodePort服务访问podinfo,地址为http:// <K8S_PUBLIC_IP>:31198。
接下来定义一个至少维护两个副本的HPA,如果CPU平均值超过80%或内存超过200Mi,则最多可扩展到10个:
apiVersion: autoscaling/v2beta1kind: HorizontalPodAutoscalermetadata: name: podinfospec: scaleTargetRef: apiVersion: extensions/v1beta1 kind: Deployment name: podinfo minReplicas: 2 maxReplicas: 10 metrics: - type: Resource resource: name: cpu targetAverageUtilization: 80 - type: Resource resource: name: memory targetAverageValue: 200Mi
创建这个hpa:
kubectl create -f ./podinfo/podinfo-hpa.yaml
几秒钟后,HPA控制器联系度量服务器,然后获取CPU和内存使用情况:
kubectl get hpaNAME REFERENCE TARGETS MINPODS MAXPODS REPLICAS AGEpodinfo Deployment/podinfo 2826240 / 200Mi, 15% / 80% 2 10 2 5m
为了增加CPU使用率,请使用rakyll / hey运行负载测试:
#install heygo get -u github.com/rakyll/hey#do 10K requestshey -n 10000 -q 10 -c 5 http://:31198/
您可以使用以下方式监控HPA事件:
$ kubectl describe hpaEvents: Type Reason Age From Message ---- ------ ---- ---- ------- Normal SuccessfulRescale 7m horizontal-pod-autoscaler New size: 4; reason: cpu resource utilization (percentage of request) above target Normal SuccessfulRescale 3m horizontal-pod-autoscaler New size: 8; reason: cpu resource utilization (percentage of request) above target
暂时删除podinfo。稍后将在本教程中再次部署它:
kubectl delete -f ./podinfo/podinfo-hpa.yaml,./podinfo/podinfo-dep.yaml,./podinfo/podinfo-svc.yaml
部署 Custom Metrics Server
要根据自定义指标进行扩展,您需要拥有两个组件。一个组件,用于从应用程序收集指标并将其存储在Prometheus时间序列数据库中。第二个组件使用collect()提供的指标扩展了Kubernetes自定义指标API。
您将在专用命名空间中部署Prometheus和适配器。
创建monitoring命名空间:
kubectl create -f ./namespaces.yaml
在monitoring命名空间中部署Prometheus v2:
kubectl create -f ./prometheus
生成Prometheus适配器所需的TLS证书:
make certs
生成以下几个文件:
# ls outputapiserver.csr apiserver-key.pem apiserver.pem
部署Prometheus自定义指标API适配器:
kubectl create -f ./custom-metrics-api
列出Prometheus提供的自定义指标:
kubectl get --raw "/apis/custom.metrics.k8s.io/v1beta1" | jq .
获取monitoring命名空间中所有pod的FS使用情况:
kubectl get --raw "/apis/custom.metrics.k8s.io/v1beta1/namespaces/monitoring/pods/*/fs_usage_bytes" | jq .
查询结果如下:
{ "kind": "MetricValueList", "apiVersion": "custom.metrics.k8s.io/v1beta1", "metadata": { "selfLink": "/apis/custom.metrics.k8s.io/v1beta1/namespaces/monitoring/pods/%2A/fs_usage_bytes" }, "items": [ { "describedObject": { "kind": "Pod", "namespace": "monitoring", "name": "custom-metrics-apiserver-657644489c-pk8rb", "apiVersion": "/v1" }, "metricName": "fs_usage_bytes", "timestamp": "2019-02-13T08:52:30Z", "value": "94253056" }, { "describedObject": { "kind": "Pod", "namespace": "monitoring", "name": "prometheus-7d4f6d4454-v4fnd", "apiVersion": "/v1" }, "metricName": "fs_usage_bytes", "timestamp": "2019-02-13T08:52:30Z", "value": "24576" } ]}
基于custom metrics 自动伸缩
在默认命名空间中创建podinfo NodePort服务和部署:
kubectl create -f ./podinfo/podinfo-svc.yaml,./podinfo/podinfo-dep.yaml
podinfo应用程序公开名为http_requests_total的自定义指标。 Prometheus适配器删除_total后缀并将度量标记为计数器度量标准。
从自定义指标API获取每秒的总请求数:
kubectl get --raw "/apis/custom.metrics.k8s.io/v1beta1/namespaces/default/pods/*/http_requests" | jq .
{ "kind": "MetricValueList", "apiVersion": "custom.metrics.k8s.io/v1beta1", "metadata": { "selfLink": "/apis/custom.metrics.k8s.io/v1beta1/namespaces/default/pods/%2A/http_requests" }, "items": [ { "describedObject": { "kind": "Pod", "namespace": "default", "name": "podinfo-6b86c8ccc9-kv5g9", "apiVersion": "/__internal" }, "metricName": "http_requests", "timestamp": "2018-01-10T16:49:07Z", "value": "901m" }, { "describedObject": { "kind": "Pod", "namespace": "default", "name": "podinfo-6b86c8ccc9-nm7bl", "apiVersion": "/__internal" }, "metricName": "http_requests", "timestamp": "2018-01-10T16:49:07Z", "value": "898m" } ]}
建一个HPA,如果请求数超过每秒10个,将扩展podinfo部署:
apiVersion: autoscaling/v2beta1kind: HorizontalPodAutoscalermetadata: name: podinfospec: scaleTargetRef: apiVersion: extensions/v1beta1 kind: Deployment name: podinfo minReplicas: 2 maxReplicas: 10 metrics: - type: Pods pods: metricName: http_requests targetAverageValue: 10
在默认命名空间中部署podinfo HPA:
kubectl create -f ./podinfo/podinfo-hpa-custom.yaml
几秒钟后,HPA从指标API获取http_requests值:
kubectl get hpaNAME REFERENCE TARGETS MINPODS MAXPODS REPLICAS AGEpodinfo Deployment/podinfo 899m / 10 2 10 2 1m
在podinfo服务上应用一些负载,每秒25个请求:
#install heygo get -u github.com/rakyll/hey#do 10K requests rate limited at 25 QPShey -n 10000 -q 5 -c 5 http://:31198/healthz
几分钟后,HPA开始扩展部署:
kubectl describe hpaName: podinfoNamespace: defaultReference: Deployment/podinfoMetrics: ( current / target ) "http_requests" on pods: 9059m / 10Min replicas: 2Max replicas: 10Events: Type Reason Age From Message ---- ------ ---- ---- ------- Normal SuccessfulRescale 2m horizontal-pod-autoscaler New size: 3; reason: pods metric http_requests above target
按照当前的每秒请求速率,部署永远不会达到10个pod的最大值。三个复制品足以使每个吊舱的RPS保持在10以下。
负载测试完成后,HPA会将部署缩到其初始副本:
Events: Type Reason Age From Message ---- ------ ---- ---- ------- Normal SuccessfulRescale 5m horizontal-pod-autoscaler New size: 3; reason: pods metric http_requests above target Normal SuccessfulRescale 21s horizontal-pod-autoscaler New size: 2; reason: All metrics below target
您可能已经注意到自动缩放器不会立即对使用峰值做出反应。默认情况下,度量标准同步每30秒发生一次,只有在最后3-5分钟内没有重新缩放时才能进行扩展/缩小。通过这种方式,HPA可以防止快速执行冲突的决策,并为Cluster Autoscaler提供时间。
结论
并非所有系统都可以通过单独依赖CPU /内存使用指标来满足其SLA,大多数Web和移动后端需要基于每秒请求进行自动扩展以处理任何流量突发。对于ETL应用程序,可以通过作业队列长度超过某个阈值等来触发自动缩放。通过使用Prometheus检测应用程序并公开正确的自动缩放指标,您可以对应用程序进行微调,以更好地处理突发并确保高可用性。