Longhorn
Distributed block storage — the default StorageClass for every stateful workload in the platform.
Quick facts
What it is
Longhorn provides the default storage class on each cluster and backs every StatefulSet PVC. Replication is handled inside the cluster — DC/DR durability of application data rides on Kafka MirrorMaker 2 and the redis-applier WAL, not on cross-cluster volume replication.
Architecture
Longhorn is deployed in longhorn-system in each cluster. The default StorageClass is named longhorn with replica count 3 (one volume copy per node). Volumes are exposed as standard PVs to PVCs across the cluster; iSCSI is the in-cluster transport. Snapshots and (optionally) backups to S3-compatible storage are first-class concepts in Longhorn — currently snapshots are used ad-hoc, no scheduled snapshot policy is wired.
Cross-cluster volume replication is NOT in scope. Application-level durability rides on Kafka (MirrorMaker 2) for the WAL pattern, and on application-specific replication for Postgres (per-app StatefulSets in each cluster). Longhorn's volume-level cross-cluster replication is a paid feature and we don't use it.
Configuration
Source: clusters/<cluster>/infra/longhorn/ (chart values + StorageClass overrides). Standard install with the chart's defaults except defaultDataPath tuned to the lab's disk layout.
Apps that explicitly set storageClassName: longhorn: Vault (audit + data PVCs), Keycloak Postgres, Kafka brokers + controllers, Spotahome Redis, AWX Postgres, Terrakube Postgres + Redis, RedisInsight, SigNoz ClickHouse.
Operations
- UI:
kubectl -n longhorn-system port-forward svc/longhorn-frontend 8080:80— there is no edge-routed Longhorn UI; admin-only. - Volume health:
kubectl -n longhorn-system get volumes, thenkubectl describe volume <name>for replica state. - Recover a degraded volume: usually self-healing if a node returns; for hard cases, the UI's "Salvage" button + replica delete is the path.
- Disk pressure: visible as
pendingvolumes; Longhorn schedules to nodes with free space. Add disk vianode.longhorn.ioCRDs.
Failover
No cross-cluster failover for volumes themselves. Each cluster owns its own data plane. When DC's Vault scales to 0, the Longhorn volumes detach cleanly; Vault's Raft data is preserved on the PVCs. Bringing the pods back attaches the volumes again — this is what happened during the 2026-05-04 Vault failover smoke test (~9 s edge HAProxy flip; volumes themselves were untouched on each side).
Application-level state replication that DOES survive cluster loss: Kafka + MirrorMaker 2 (data plane WAL), Vault logical replication via MinIO (Phase 2), per-app Postgres streaming where used.
References
- Upstream Longhorn docs
- ADR-0008 (LVM storage default — historical, RHACM hub fleet) — Longhorn applies only to the RKE2 lab clusters
- STATE.md → "Foundation" row mentions agent2's Longhorn fix that brought DC/DR to parity