TimescaleDB Datastore Overview
TimescaleDB is a time-series database that is built on top of PostgreSQL. It is designed to handle large amounts of time-series data and provide fast querying and aggregation capabilities. It is an open-source project and can be used on various platforms such as Linux, Windows and MacOS.
One of the key features of TimescaleDB is its ability to scale horizontally. This means that as data grows, you can add more machines to the cluster, rather than upgrading a single machine, to maintain fast query performance. This is achieved by using a technique called “time-series partitioning” where the data is automatically partitioned based on the time interval.
Another beneficial feature of TimescaleDB is its ability to handle large amount of data, it can handle Billion of rows, this is possible thanks to its hybrid storage engine, where the database uses disk-based storage for historical data, and RAM-based storage for recent data. This enables fast queries on recent data while still being able to query and analyze historical data.
Why TimescaleDB
Cloudentity uses TimescaleDB to store audit and analytics/metrics data because it is a powerful and efficient tool for handling large amounts of time-series data. One of the key benefits of TimescaleDB is its ability to scale horizontally, which means that as the volume of audit/analytics/metrics data grows, Cloudentity can add more machines to the cluster, rather than upgrading a single machine, to maintain fast query performance.
Another benefit of TimescaleDB when storing audit data is its ability to handle complex queries efficiently. TimescaleDB provides various time-based aggregate functions, which enables Cloudentity to perform complex queries on the audit/analytics/metrics data with high performance. This makes it easy to perform analysis on the audit data to identify patterns, detect anomalies and extract insights.
TimescaleDB Installation
Important
Out of the three databases that we install and configure, TimescaleDB is the only datastore optional to be installed. Remember that if you choose not to install the TimescaleDB, you won’t be able to use the audit events, analytics, and metrics features and APIs built into our platform.
At Cloudentity, to install and configure TimescaleDB, we use Helm - a popular package manager for Kubernetes that allows users to easily install and configure complex software such as TimescaleDB on a Kubernetes cluster. By using Helm to install TimescaleDB, users can take advantage of several benefits that make the process of deploying and managing TimescaleDB much simpler and more efficient.
Firstly, Helm provides a convenient way to define and manage the configuration of TimescaleDB, including the number of nodes, storage settings, and networking settings, in a single, easy-to-read file called a chart. This makes it easy to understand and modify the configuration of TimescaleDB as needed.
Additionally, Helm provides the ability to manage and upgrade the TimescaleDB deployment in a controlled and repeatable way, this means that any updates or upgrades to the TimescaleDB software can be easily rolled out to the cluster in a predictable manner, avoiding any possible disruption to the service.
When you install the Cloudentity plafrom on Kubernetes using Helm Charts, you can see that the TimescaleDB dependency is included in our kube-acp-stack Helm Chart.
TimescaleDB Version Recommendation
Database: 2.8.0 (with Postgres 14.5) Helm chart: 0.16.3
Supported TimescaleDB Versions
- 2.8.x (with Postgres 14.5)
Install TimescaleDB in Kubernetes Cluster
- Create a namespace for TimescaleDB.
kubectl create namespace acp-db
Prepare configmap
-
Create
create_extra_dbs.sh
that will create database for Cloudentity to use. Write the following content to the file:#!/bin/bash psql -d "$1" <<__SQL__ CREATE ROLE acp WITH LOGIN SUPERUSER; CREATE DATABASE acpdb OWNER acp; GRANT ALL PRIVILEGES ON DATABASE acpdb TO acp; __SQL__
-
Upload
create_extra_dbs.sh
to kubectl.kubectl create configmap timescale-post-init --from-file=create_extra_dbs.sh --namespace acp-db
Prepare passwords setup
-
Create
set_passwords.sh
file (remember to replace password with your own).#!/bin/bash psql -d "$1" --file=- --set ON_ERROR_STOP=1 << __SQL__ SET log_statement TO none; -- prevent these passwords from being logged ALTER USER acp WITH PASSWORD 'PaSsW0rD'; __SQL__
-
Create a secret.
kubectl create secret generic timescale-post-init-pw --from-file=set_passwords.sh --namespace acp-db
Install TimescaleDB
-
Prepare the configuration file i.e.
config.yaml
.postInit: - configMap: name: timescale-post-init - secret: name: timescale-post-init-pw
-
To install the TimescaleDB database, execute the following command in your terminal:
helm repo add timescale 'https://charts.timescale.com' helm repo update helm upgrade --install timescaledb --namespace acp-db timescale/timescaledb-single -f config.yaml --version 0.13.1
Configure TimescaleDB Dependency
If you wish to configure the connection between the Cloudentity platform and TimescaleDB configure the values.yaml file for your Cloudentity deployment and apply the changes.
Configure Connection Between Cloudentity Platform and TimescaleDB Datastore
If you need to configure the connection between the Cloudentity platform and TimescaleDB:
-
Refer to the
timescale
(timescale client) section of the Cloudentity Platform Configuration Reference to learn about available configuration options. -
Change the configuration for the connection in the
acp.config.data.timescale
section of the Cloudentity Platform values.yaml file for your deployment. -
Apply the changes to your deployment.
TimescaleDB Integration Configuration Example
If you chose to deploy the TimescaleDB datastore following the instructions from the Install TimescaleDB in Kubernetes Cluster section, the configuration for the connection between the TimescaleDB and the Cloudentity platform looks like the following:
acp:
enabled: true
config:
data:
timescale:
enabled: true
url: postgres://acp:PaSsW0rD@timescaledb.acp-db.svc.cluster.local/acpdb
migrations:
path: ./migrations/timescale
timeout: 1m0s
Such configuration present in the values.yaml
file, results in the following configuration
included in the /data/extraconfig.yaml
file and passed to your Cloudentity
deployment:
timescale:
enabled: true
migrations:
path: ./migrations/timescale
timeout: 1m0s
url: postgres://acp:PaSsW0rD@timescaledb.acp-db.svc.cluster.local/acpdb
Troubleshooting
If your TimescaleDB deployment is configured incorrectly, you can see the following error message appearing in Cloudentity logs:
{"error":"failed to create database client: failed to connect to `host=timescale user=postgres database=acp`: hostname resolving error (lookup timescale on 1.0.0.0:1: server misbehaving)","level":"fatal","msg":"failed to connect to timescale database"}
If there is a connection issue between the Cloudentity platform and TimescaleDB, you can see the following error message appearing in Cloudentity logs:
{"error":"failed to create database client: failed to connect to `host=acp-cockroachdb-public user=root database=defaultdb`: dial error (dial tcp 1.0.0.0:1: connect: connection refused)","level":"fatal","msg":"failed to connect to the database"}
Hardening TimescaleDB Installation
Before going to production environment, make sure your TimescaleDB deployment is hardened properly by implementing various security measures and best practices. The goal is to protect sensitive data and maintain the integrity, availability, and confidentiality of the datastore.
Network Security
-
Deploy a Dedicated Namespace
Creating a dedicated namespace for your TimescaleDB deployment helps isolate it from other applications in the Kubernetes cluster. This isolation reduces the risk of unauthorized access or interference between applications.
kubectl create namespace timescaledb
-
Configure NetworkPolicies
NetworkPolicies are Kubernetes resources that allow you to control network traffic between pods in your cluster. By configuring NetworkPolicies, you can restrict incoming and outgoing traffic to the TimescaleDB pods, limiting the potential attack surface.
Secrets Management
Managing secrets is an essential aspect of securing your TimescaleDB deployment. Properly handling secrets ensures that confidential data, such as credentials, is not exposed or compromised.
-
Set up Patroni.
Patroni is a high-availability solution for PostgreSQL databases. TimescaleDB Helm charts use random secrets by default for Patroni. You can configure these secrets according to your password policy:
Timescale Helm Charts Secret Patroni
To generate secure passwords that adhere to your password policy, use tools like pwgen or openssl.
-
Set up Pgbackrest.
Pgbackrest is a backup and restore solution for PostgreSQL databases. Configure credentials needed for automated backup processes in external storage, such as AWS S3:
Cloudentity Secrets Security Recommendation
Encrypt your Kubernetes secrets using tools like Mozilla SOPS or Bitnami Sealed Secrets.
Certificate Management
Secure communication between your application and TimescaleDB is crucial. By default, TimescaleDB Helm charts use autogenerated certificates. You can configure these certificates:
Timescale Helm Charts Secret Certificate
Generate the certificates using the following commands:
# Generate CA key and certificate
openssl req -new -nodes -text -out ca.csr -keyout ca.key -subj "/CN=timescaledb-ca"
openssl x509 -req -in ca.csr -text -out ca.crt -extensions v3_ca -signkey ca.key -days 3650
# Generate server key and certificate
openssl req -new -nodes -text -out server.csr -keyout server.key -subj "/CN=timescaledb-server"
openssl x509 -req -in server.csr -text -out server.crt -extensions v3_req -CA ca.crt -CA
Configure RBAC
Role-Based Access Control (RBAC) is a security mechanism that allows you to manage access to computer or network resources based on user roles within your organization.
Timescale Helm Charts Role TimescaleDB
Backup and Restore
Backing up your TimescaleDB clusters is crucial for ensuring data durability and recovering from system failures. TimescaleDB allows you to create backups of clusters, which you can use to bootstrap a new instance in case of a system failure:
Timescale Helm Charts TimescaleDB Single
Backup and Restore Best Practices
To maximize the reliability and security of your backup strategy, consider the following best practices:
- Schedule regular backups to ensure up-to-date data is available for recovery.
- Store backups in two independent storage locations (e.g., two S3 buckets in different regions) to protect against data loss due to a single storage failure.
- Monitor the success of backup creation to identify and resolve any issues promptly.
- Encrypt your backups to protect sensitive data from unauthorized access.
- Test and document your backup restore procedures to ensure a smooth recovery process during an actual system failure.