Quick facts

Role
PersistentVolume provisioner
Backed by
Local disks on each RKE2 node
Used by
Vault, Kafka, Postgres, Redis, MinIO-on-cluster (where applicable), AWX

What it is

Longhorn provides the default storage class on each cluster and backs every StatefulSet PVC. Replication is handled inside the cluster — DC/DR durability of application data rides on Kafka MirrorMaker 2 and the redis-applier WAL, not on cross-cluster volume replication.

Architecture

Longhorn is deployed in longhorn-system in each cluster. The default StorageClass is named longhorn with replica count 3 (one volume copy per node). Volumes are exposed as standard PVs to PVCs across the cluster; iSCSI is the in-cluster transport. Snapshots and (optionally) backups to S3-compatible storage are first-class concepts in Longhorn — currently snapshots are used ad-hoc, no scheduled snapshot policy is wired.

Cross-cluster volume replication is NOT in scope. Application-level durability rides on Kafka (MirrorMaker 2) for the WAL pattern, and on application-specific replication for Postgres (per-app StatefulSets in each cluster). Longhorn's volume-level cross-cluster replication is a paid feature and we don't use it.

Configuration

Source: clusters/<cluster>/infra/longhorn/ (chart values + StorageClass overrides). Standard install with the chart's defaults except defaultDataPath tuned to the lab's disk layout.

Apps that explicitly set storageClassName: longhorn: Vault (audit + data PVCs), Keycloak Postgres, Kafka brokers + controllers, Spotahome Redis, AWX Postgres, Terrakube Postgres + Redis, RedisInsight, SigNoz ClickHouse.

Operations

Failover

No cross-cluster failover for volumes themselves. Each cluster owns its own data plane. When DC's Vault scales to 0, the Longhorn volumes detach cleanly; Vault's Raft data is preserved on the PVCs. Bringing the pods back attaches the volumes again — this is what happened during the 2026-05-04 Vault failover smoke test (~9 s edge HAProxy flip; volumes themselves were untouched on each side).

Application-level state replication that DOES survive cluster loss: Kafka + MirrorMaker 2 (data plane WAL), Vault logical replication via MinIO (Phase 2), per-app Postgres streaming where used.

References