Deployment and Operations

7 mins read

Monitoring System

Installing and configuring a generic monitoring system.

Monitoring at Cloudentity

Monitoring is a critical aspect for us at Cloudentity, playing a key role in our SRE, DevOps, and Support operations across diverse infrastructures. Finding an ideal solution was challenging, requiring careful consideration of various factors including system architecture and configuration. This article outlines how we utilize Thanos to store Prometheus metrics for an EKS Multi-Cluster AWS EKS-based setup. Additionally, we’ve provided ready-to-use monitoring configurations to ensure effective Cloudentity monitoring within your infrastructure.ion.


Here are some assumptions we wanted our monitoring system to support:

  • We needed a centralized observability system to collect data from all regions' clusters (and it’s able to handle a growing load)
  • The infrastructure should be highly available and resilient.
  • The solution should be cost-efficient.
  • Collected data should be kept for at least 90 days.
  • Prevent false negatives and concentrate solely on what is important.



Our monitoring system is made up of two main components:

  • We have local Thanos clusters deployed in each of our SaaS regions. These clusters are independent and store (and visualize) metrics from the AWS region where they are located.

  • The central Thanos cluster which handles alerts and gives access to the main Grafana instance (it allows us to view/visualize data from all regions in one convenient location).


  • High availability - we can still perform our SRE work even if the central Thanos cluster is down, as we can access the local clusters for critical situations. In most cases, the failure of a single region is not a problem because the central Thanos cluster still has access to data collected by local instances. It’s important to note that we store our data in S3 buckets.
  • The central Grafana instance allows us to monitor our infrastructure from a holistic perspective. This is particularly crucial as our traffic can be automatically distributed across different regions depending on the user’s location.
  • Our solution is cost effective as we don’t have to send all the data from local clusters to the central one. Instead, the data is stored securely in S3 buckets that are localized per region. These buckets are not dependent on our application infrastructure and are difficult to remove or destroy. Additionally, the traffic between the Central Cluster and Local Clusters is limited to alert notifications and data that our engineers request through Cental Grafana.
  • Our data is stored in S3 buckets, which provides a favorable balance between performance and costs. This enables us to retain our data for a period of 90 days and more.
  • The central Thanos cluster has an additional role in our company. It can monitor non-EKS-related elements, such as the state of our WAF, stats page, and official website. Additionally, this approach allows us to group alerts from different places before sending them to us via Slack or PagerDuty.


For a complete and ready-to-use solution, consider exploring our Cloudentity on Kubernetes via the GitOps approach. Get started with our quickstart guide, and delve deeper with the deployment configuration details.



This article focuses on configuring a simplified version of the monitoring based on the local (single) Thanos cluster, but it’s worth noting that a similar configuration can be applied to a central cluster as well.

We utilize AWS EKS, however, the solution outlined in this article can likely be implemented on any Kubernetes cluster, such as GKE, AKS, or bare-metal. It assumes that you possess a fundamental understanding of Helm and Prometheus.

We utilize two helm charts: kube-prometheus-stack which installs thanos which is installed on top of the existing Prometheus deployment.


Configuration Files

Create the following files:


apiVersion: v1
kind: Namespace
  name: monitoring


To demonstrate, we will use minio, a tool that creates easy S3-compatible storage. However, for storing production metrics, we highly recommend using AWS S3 or a similar solution.

objstorage-local.yaml (for local testing purposes):

apiVersion: v1
kind: Secret
type: Opaque
  name: thanos-objstore
  namespace: monitoring
  objstore.yml: |-
    type: s3
      bucket: thanos
      endpoint: thanos-minio.monitoring:9000
      access_key: admin
      secret_key: p@ssw0rd!
      insecure: true

objstorage-prod.yaml (for production purposes):

apiVersion: v1
kind: Secret
type: Opaque
  name: thanos-objstore
  namespace: monitoring
  objstore.yml: |-
    type: s3
      bucket: thanos-env_name-region_name
      region: region_name
      aws_sdk_auth: true

prometheus.yaml (here you can find the list of available parameters):

  enabled: false
  enabled: false
  enabled: false
  enabled: false
  enabled: false
  adminPassword: prom-operator
      url: 'http://thanos-query-frontend:9090/'
fullnameOverride: prometheus
    enabled: true
    enabled: true
        key: objstore-local.yml
        name: thanos-objstore
    serviceMonitorSelectorNilUsesHelmValues: false
    podMonitorSelectorNilUsesHelmValues: false
    ruleSelectorNilUsesHelmValues: false
    prometheusExternalLabelNameClear: true
    disableCompaction: true

thanos.yaml (here you can find the list of available parameters):

  enabled: true
    rootUser: admin
    rootPassword: p@ssw0rd!
  enabled: true
    enabled: true
    enabled: true
      absent_rules: true
      store_gateway: true
      compaction: true
      query: true
  replicaCount: 1
    sidecarsService: prometheus-thanos-discovery
    sidecarsNamespace: monitoring
  enabled: true
  retentionResolutionRaw: 7d
  retentionResolution5m: 30d
  retentionResolution1h: 90d
    size: 100Gi
  enabled: true
    size: 10Gi
existingObjstoreSecret: thanos-objstore


Add the prometheus-community and thanos charts repositories to Helm:

helm repo add prometheus-community
helm repo add thanos
helm repo update

Create a monitoring namespace:

kubectl apply -f ./namespace.yaml
namespace/monitoring created

Create an objecstore configuration:

kubectl apply -f ./objstorage-local.yaml
secret/thanos-objstore created

Install Prometheus stack:

helm install prometheus prometheus-community/kube-prometheus-stack --values ./prometheus.yaml --namespace monitoring
NAME: prometheus
LAST DEPLOYED: Mon Jan 1 00:12:30 2023
NAMESPACE: default
STATUS: deployed
kube-prometheus-stack has been installed. Check its status by running:
  kubectl --namespace monitoring get pods -l "release=prometheus"

After successfully installing Prometheus, you can access Grafana by going to http://localhost:3000. However, make sure to replace “localhost” with the appropriate IP depending on your scenario.


Default Grafana username is admin and password is prom-operator. Please remember to change those values in a production environment.

Grafana includes pre-built dashboards that allow you to monitor the current statistics of your clusters.

Predefined Dashboards

It has also preconfigured data source (Thanos Query), but as you can see by going to a randomly choosen dashboard, data is not yet visible. You can also confirm, that the data source is not passing the connection test.

Grafana Dashboard

Install Thanos.

helm install thanos thanos/thanos --values thanos.yaml --namespace monitoring
NAME: thanos
LAST DEPLOYED: Mon Jun 26 14:22:28 2023
NAMESPACE: default
STATUS: deployed
CHART NAME: thanos
APP VERSION: 0.31.0** Please be patient while the chart is being deployed **

Thanos chart was deployed enabling the following components:

  • Thanos Query
  • Thanos Compactor
  • Thanos Store Gateway

Thanos Query can be accessed through following DNS name from within your cluster:

thanos-query.default.svc.cluster.local (port 9090)

To access Thanos Query from outside the cluster execute the following commands:

  1. Get the Thanos Query URL by running these commands:

    export SERVICE_PORT=$(kubectl get --namespace default -o jsonpath="{.spec.ports[0].port}" services thanos-query)
    kubectl port-forward --namespace default svc/thanos-query ${SERVICE_PORT}:${SERVICE_PORT} &
    echo "${SERVICE_PORT}"
  2. Open a browser and access Thanos Query using the obtained URL.

Once the Thanos is installed, you can verify connection between the Thanos Query frontend and Grafana:

Thanos datasource in Grafana

If everything looks good, you can confirm that data collected by Thanos is visible in one of predeployed Grafana dashboards:

Grafana Dashboard

Post-Installation Summary

You may be wondering why we chose to install Thanos alongside Prometheus. There are several reasons for this decision. Firstly, Thanos architecture is more scalable and easier to manage. Additionally, Thanos can store your cluster data on S3-compatible backends, and Collector ensures constant data aggregation for your historical data.

How to Uninstall

If you want to remove Prometheus and Thanos from your cluster, you can do so by executing:

helm uninstall thanos --namespace monitoring
helm uninstall prometheus --namespace monitoring
kubectl delete -f ./objstorage-local.yaml
kubectl delete -f ./namespace.yaml


In conclusion, Thanos helps us achieve high availability, durability, and cost-efficiency with Prometheus. While it may require substantial effort to implement Thanos and Prometheus across multiple Kubernetes clusters, the benefits of having a highly available Prometheus make it a worthwhile endeavor. This project was one of our challenging experiences, and we would have appreciated having this article as a guide when we began.

Updated: Oct 27, 2023