Deployment and Operations

7 mins read

Installing and Configuring TimescaleDB for Storing Audit Data

Learn about TimescaleDB datastore configuration for deployment of Cloudentity platform.

TimescaleDB Datastore Overview

TimescaleDB is a time-series database that is built on top of PostgreSQL. It is designed to handle large amounts of time-series data and provide fast querying and aggregation capabilities. It is an open-source project and can be used on various platforms such as Linux, Windows and MacOS.

One of the key features of TimescaleDB is its ability to scale horizontally. This means that as data grows, you can add more machines to the cluster, rather than upgrading a single machine, to maintain fast query performance. This is achieved by using a technique called “time-series partitioning” where the data is automatically partitioned based on the time interval.

Another beneficial feature of TimescaleDB is its ability to handle large amount of data, it can handle Billion of rows, this is possible thanks to its hybrid storage engine, where the database uses disk-based storage for historical data, and RAM-based storage for recent data. This enables fast queries on recent data while still being able to query and analyze historical data.

Why TimescaleDB

Cloudentity uses TimescaleDB to store audit and analytics/metrics data because it is a powerful and efficient tool for handling large amounts of time-series data. One of the key benefits of TimescaleDB is its ability to scale horizontally, which means that as the volume of audit/analytics/metrics data grows, Cloudentity can add more machines to the cluster, rather than upgrading a single machine, to maintain fast query performance.

Another benefit of TimescaleDB when storing audit data is its ability to handle complex queries efficiently. TimescaleDB provides various time-based aggregate functions, which enables Cloudentity to perform complex queries on the audit/analytics/metrics data with high performance. This makes it easy to perform analysis on the audit data to identify patterns, detect anomalies and extract insights.

TimescaleDB Installation

Important

Out of the three databases that we install and configure, TimescaleDB is the only datastore optional to be installed. Remember that if you choose not to install the TimescaleDB, you won’t be able to use the audit events, analytics, and metrics features and APIs built into our platform.

At Cloudentity, to install and configure TimescaleDB, we use Helm - a popular package manager for Kubernetes that allows users to easily install and configure complex software such as TimescaleDB on a Kubernetes cluster. By using Helm to install TimescaleDB, users can take advantage of several benefits that make the process of deploying and managing TimescaleDB much simpler and more efficient.

Firstly, Helm provides a convenient way to define and manage the configuration of TimescaleDB, including the number of nodes, storage settings, and networking settings, in a single, easy-to-read file called a chart. This makes it easy to understand and modify the configuration of TimescaleDB as needed.

Additionally, Helm provides the ability to manage and upgrade the TimescaleDB deployment in a controlled and repeatable way, this means that any updates or upgrades to the TimescaleDB software can be easily rolled out to the cluster in a predictable manner, avoiding any possible disruption to the service.

When you install the Cloudentity plafrom on Kubernetes using Helm Charts, you can see that the TimescaleDB dependency is included in our kube-acp-stack Helm Chart.

TimescaleDB Version Recommendation

Database: 2.14.2 (with Postgres 14.9) Helm chart: 0.33.1

Supported TimescaleDB Versions

  • 2.8.x (with Postgres 14.5)

Configure TimescaleDB Dependency

If you wish to configure the connection between the Cloudentity platform and TimescaleDB configure the values.yaml file for your Cloudentity deployment and apply the changes.

Configure Connection Between Cloudentity Platform and TimescaleDB Datastore

If you need to configure the connection between the Cloudentity platform and TimescaleDB:

  1. Refer to the timescale (timescale client) section of the Cloudentity Platform Configuration Reference to learn about available configuration options.

  2. Change the configuration for the connection in the acp.config.data.timescale section of the Cloudentity Platform values.yaml file for your deployment.

  3. Apply the changes to your deployment.

Configure TimescaleDB Datastore

Cloudentity delivers a production-grade example derived from our own operational experiences. This example is readily accessible within the acp-on-k8s repository. For deploying TimescaleDB, we employ a Helm chart, ensuring streamlined deployment and manageability, coupled with some custom configuration scripts optimized for high availability.

We recommend utilizing this example as a foundational reference for your TimescaleDB deployment. Get started with our quickstart guide.

Troubleshooting

If your TimescaleDB deployment is configured incorrectly, you can see the following error message appearing in Cloudentity logs:

{"error":"failed to create database client: failed to connect to `host=timescale user=postgres database=acp`: hostname resolving error (lookup timescale on 1.0.0.0:1: server misbehaving)","level":"fatal","msg":"failed to connect to timescale database"}

If there is a connection issue between the Cloudentity platform and TimescaleDB, you can see the following error message appearing in Cloudentity logs:

{"error":"failed to create database client: failed to connect to `host=acp-cockroachdb-public user=root database=defaultdb`: dial error (dial tcp 1.0.0.0:1: connect: connection refused)","level":"fatal","msg":"failed to connect to the database"}

Hardening TimescaleDB Installation

Before going to production environment, make sure your TimescaleDB deployment is hardened properly by implementing various security measures and best practices. The goal is to protect sensitive data and maintain the integrity, availability, and confidentiality of the datastore.

Network Security

  1. Deploy a Dedicated Namespace

    Creating a dedicated namespace for your TimescaleDB deployment helps isolate it from other applications in the Kubernetes cluster. This isolation reduces the risk of unauthorized access or interference between applications.

    kubectl create namespace timescaledb
    
  2. Configure NetworkPolicies

    NetworkPolicies are Kubernetes resources that allow you to control network traffic between pods in your cluster. By configuring NetworkPolicies, you can restrict incoming and outgoing traffic to the TimescaleDB pods, limiting the potential attack surface.

    Timescale Helm Charts Network Policy

Secrets Management

Managing secrets is an essential aspect of securing your TimescaleDB deployment. Properly handling secrets ensures that confidential data, such as credentials, is not exposed or compromised.

  1. Set up Patroni.

    Patroni is a high-availability solution for PostgreSQL databases. TimescaleDB Helm charts use random secrets by default for Patroni. You can configure these secrets according to your password policy:

    Timescale Helm Charts Secret Patroni

    To generate secure passwords that adhere to your password policy, use tools like pwgen or openssl.

  2. Set up Pgbackrest.

    Pgbackrest is a backup and restore solution for PostgreSQL databases. Configure credentials needed for automated backup processes in external storage, such as AWS S3:

    Timescale Helm Charts Secret Pgbackrest

Cloudentity Secrets Security Recommendation

Encrypt your Kubernetes secrets using tools like Mozilla SOPS or Bitnami Sealed Secrets.

Certificate Management

Secure communication between your application and TimescaleDB is crucial. By default, TimescaleDB Helm charts use autogenerated certificates. You can configure these certificates:

Timescale Helm Charts Secret Certificate

Generate the certificates using the following commands:

# Generate CA key and certificate
openssl req -new -nodes -text -out ca.csr -keyout ca.key -subj "/CN=timescaledb-ca"
openssl x509 -req -in ca.csr -text -out ca.crt -extensions v3_ca -signkey ca.key -days 3650

# Generate server key and certificate
openssl req -new -nodes -text -out server.csr -keyout server.key -subj "/CN=timescaledb-server"
openssl x509 -req -in server.csr -text -out server.crt -extensions v3_req -CA ca.crt -CA

Configure RBAC

Role-Based Access Control (RBAC) is a security mechanism that allows you to manage access to computer or network resources based on user roles within your organization.

Timescale Helm Charts Role TimescaleDB

Backup and Restore

Backing up your TimescaleDB clusters is crucial for ensuring data durability and recovering from system failures. TimescaleDB allows you to create backups of clusters, which you can use to bootstrap a new instance in case of a system failure:

Timescale Helm Charts TimescaleDB Single

Backup and Restore Best Practices

To maximize the reliability and security of your backup strategy, consider the following best practices:

  • Schedule regular backups to ensure up-to-date data is available for recovery.
  • Store backups in two independent storage locations (e.g., two S3 buckets in different regions) to protect against data loss due to a single storage failure.
  • Monitor the success of backup creation to identify and resolve any issues promptly.
  • Encrypt your backups to protect sensitive data from unauthorized access.
  • Test and document your backup restore procedures to ensure a smooth recovery process during an actual system failure.
Updated: Oct 27, 2023