Autoscaling in a Nutshell
Kubernetes workload resources (such as the Cloudentity platform deployment) can be automatically scaled to match demand based on custom metrics, multiple metrics, or metric APIs. For example, increased load may result in additional pods being deployed. On the other hand, if the load decreases, the workload resources is scaled back down.
Cloudentity Helm Chart provides support for the HorizontalPodAutoscaler resource. Metrics Server must be used to expose Cloudentity resources usage metrics used by autoscaling.
More on autoscaling details can be found in the Horizontal Pod Autoscaler K8s documentation.
- Kubernetes cluster v1.16+
- Kubernetes Metrics Server
- Helm v3.0+
- Resource requests specified in the Helm chart
Autoscaling can be enabled for base Cloudentity pods as well as
The configuration parameters are
identical, although worker pods configuration is located under
workers key in
For autoscaling to work properly, resource requests must be set for the Cloudentity pods.
resources: requests: cpu: 500m memory: 1.2Gi
To enable autoscaling integration, set the
autoscaling.enabled parameter to
When autoscaling is enabled, the
replicaCount parameter is ignored.
autoscaling.minReplicasparameter is used to set the minimum number of replicas.
autoscaling.maxReplicasparameter is used to set the maximum number of replicas.
autoscaling.targetCPUUtilizationPercentageparameter can be used to enable CPU autoscaling at a given percentage.
autoscaling.targetMemoryUtilizationPercentageparameter can be used to enable memory autoscaling at a given percentage.
behaviorparameter can be used to configure detailed scaling behaviors
autoscaling: ## If true, autoscaling is enabled ## enabled: true ## Set a minimum number of 3 replicas ## minReplicas: 3 ## Set a maximum number of 9 replicas ## maxReplicas: 9 ## Enable CPU autoscaling at 70% of request utilization ## targetCPUUtilizationPercentage: 70 ## Enable memory autoscaling at 50% of request utilization ## targetMemoryUtilizationPercentage: 50 ## Consider utilization values from last 5 minutes during scaling ## Scale Down one pod at a time every 180 seconds ## behavior: scaleDown: stabilizationWindowSeconds: 300 policies: - type: Pods value: 1 periodSeconds: 180
- Autoscaling works based on the average usage across all Cloudentity pods for given metric.
- Average value of 100% for the metric is defined in
- CPU and Memory autoscaling can work at the same time. Scaling works on metric which reports the higher desired count.
- The default upscale delay equals
0s(controlled by cluster operator in kube-controller-manager)
- The default downscale delay equals
5m(controlled by cluster operator in kube-controller-manager)
- Metrics scrape interval can be configured in metric-server via the
- The default scale up bahavior is to add 100% of currently running replicas or 4 pods (whichever is higher) every 15 seconds based on last metric
- The default scale down bahavior is to remove up to 100% of currently running replicas every 15 seconds based on 5 minutes of metrics