Longhorn — BRAC POC tools

Quick facts

Role: PersistentVolume provisioner
Backed by: Local disks on each RKE2 node
Used by: Vault, Kafka, Postgres, Redis, MinIO-on-cluster (where applicable), AWX

What it is

Longhorn provides the default storage class on each cluster and backs every StatefulSet PVC. Replication is handled inside the cluster — DC/DR durability of application data rides on Kafka MirrorMaker 2 and the redis-applier WAL, not on cross-cluster volume replication.

Architecture

Longhorn is deployed in longhorn-system in each cluster. The default StorageClass is named longhorn with replica count 3 (one volume copy per node). Volumes are exposed as standard PVs to PVCs across the cluster; iSCSI is the in-cluster transport. Snapshots and (optionally) backups to S3-compatible storage are first-class concepts in Longhorn — currently snapshots are used ad-hoc, no scheduled snapshot policy is wired.

Cross-cluster volume replication is NOT in scope. Application-level durability rides on Kafka (MirrorMaker 2) for the WAL pattern, and on application-specific replication for Postgres (per-app StatefulSets in each cluster). Longhorn's volume-level cross-cluster replication is a paid feature and we don't use it.

Configuration

Source: clusters/<cluster>/infra/longhorn/ (chart values + StorageClass overrides). Standard install with the chart's defaults except defaultDataPath tuned to the lab's disk layout.

Apps that explicitly set storageClassName: longhorn: Vault (audit + data PVCs), Keycloak Postgres, Kafka brokers + controllers, Spotahome Redis, AWX Postgres, Terrakube Postgres + Redis, RedisInsight, SigNoz ClickHouse.

Operations

UI: kubectl -n longhorn-system port-forward svc/longhorn-frontend 8080:80 — there is no edge-routed Longhorn UI; admin-only.
Volume health: kubectl -n longhorn-system get volumes, then kubectl describe volume <name> for replica state.
Recover a degraded volume: usually self-healing if a node returns; for hard cases, the UI's "Salvage" button + replica delete is the path.
Disk pressure: visible as pending volumes; Longhorn schedules to nodes with free space. Add disk via node.longhorn.io CRDs.

Failover

No cross-cluster failover for volumes themselves. Each cluster owns its own data plane. When DC's Vault scales to 0, the Longhorn volumes detach cleanly; Vault's Raft data is preserved on the PVCs. Bringing the pods back attaches the volumes again — this is what happened during the 2026-05-04 Vault failover smoke test (~9 s edge HAProxy flip; volumes themselves were untouched on each side).

Application-level state replication that DOES survive cluster loss: Kafka + MirrorMaker 2 (data plane WAL), Vault logical replication via MinIO (Phase 2), per-app Postgres streaming where used.

References

Upstream Longhorn docs
ADR-0008 (LVM storage default — historical, RHACM hub fleet) — Longhorn applies only to the RKE2 lab clusters
STATE.md → "Foundation" row mentions agent2's Longhorn fix that brought DC/DR to parity