Quick facts

Chart
terrakube/terrakube 4.6.2 (charts.terrakube.io)
appVersion
2.30.1
Components
api + ui + executor + registry
Database
Per-cluster Postgres + Redis StatefulSets
Remote state
MinIO bucket terrakube-state
OIDC client
terrakube (public, PKCE), groups claim mapper
Hostnames
terrakube-{ui,api,reg}.apps.sub.comptech-lab.com
Failover (DC→DR)
~16 s
Failover (DR→DC)
~65 s (Spring Boot cold-start)

What it is

The DR images azbuilder/api-server:2.30.1 and azbuilder/executor:2.30.1 were hand-imported via ctr because Docker Hub pulls on DR are unreliable and slow.

Architecture

Four components per cluster:

Each cluster has its own Postgres + Redis StatefulSets (chart prereqs). Remote state shared via MinIO bucket terrakube-state (chmod-protected by per-app credentials at ~/cloud-init/minio-terrakube-secret-key).

OIDC: public client terrakube in the Keycloak comptech realm, with PKCE and a groups claim mapper (so a user's Keycloak group membership becomes their Terrakube role).

Configuration

Source: shared/helm-values/terrakube.yaml + clusters/<cluster>/values/terrakube.yaml. Two ArgoCD Applications: terrakube-prereqs (Postgres + Redis) at sync-wave N-1, then terrakube (the four components) at wave N.

DR-specific issue: azbuilder/api-server:2.30.1 and azbuilder/executor:2.30.1 Docker Hub pulls on DR take >15 min and often fail. Workaround: ctr import the image tarballs onto each DR node manually after a podman pull on the dl385 host. Long-term fix: Nexus docker-proxy (planned).

Memorable settings: nginx.ingress.kubernetes.io/ssl-redirect=false (MR #32 — registry needs to serve HTTP for Terraform's CLI), registry resource limit bumped to 1Gi (MR #31 — OOM at 512Mi).

Operations

Failover

Three edge HAProxy backends — terrakube-{ui,api,reg}-rke2-be, each with DC primary + DR backup. Healthcheck: GET /actuator/health for api/registry, GET / for ui. ^(200|301|302|307|308)$.

Smoke test PASSED 2026-05-05: cutover ~16 s; cutback ~65 s — Spring Boot api cold-start dominates the cutback window. 200s served continuously from DR backup throughout.

Caveat: workspace state is in MinIO so it's shared across clusters cleanly. Job history is per-cluster Postgres so DR has no record of DC's plans until manually exported.

References