HashiCorp Vault
Secrets store per cluster — Raft-3, Shamir-sealed (since 2026-05-06), DC→DR sync via logical export through MinIO.
Quick facts
What it is
Each cluster runs an independent 3-node Vault Raft cluster sealed with Shamir's Secret Sharing (operators hold 5 unseal-key shares per cluster; any 3 unseal a pod). After any pod restart — rolling helm upgrade, node reboot, eviction — the operator must feed 3 shares to each pod via vault operator unseal.
The 2026-05-05 transit auto-unseal target (vault-seed VM) was decommissioned; both clusters were re-initialised on Shamir 2026-05-06 (MR !39). Vault was empty pre-migration so no data was lost.
Cross-cluster DC→DR sync is no longer Raft-snapshot ship (per-cluster Shamir keyrings can't decrypt each other's master keys). Instead, a logical replicator (MR !40) runs as CronJobs in each cluster's vault namespace, exporting policies / auth methods / secret mounts / KV-v2 data through MinIO.
Architecture
DC cluster MinIO DR cluster
────────── ───── ──────────
vault-{0,1,2} vault-{0,1,2}
├─ Raft HA ├─ Raft HA
└─ Shamir seal └─ Shamir seal
│ │
CronJob/vault-replicator-export │
schedule: 0 * * * * │
runs `vault export` ──tarball──▶ vault-snapshots-logical/ │
dc/<ts>.tar.gz │
dc/latest.tar.gz │
│
vault-snapshots-logical/dc/latest
──tarball──▶
│
CronJob/vault-replicator-import
schedule: 15 */4 * * *
replays into DR vault
Per-cluster keys stay on disk on the operator host (~/cloud-init/vault-rke2-{dc,dr}-init-shamir.json, chmod 600). Edge HAProxy vault-rke2-be backend has DC primary + DR backup; failover happens at the L7 health-check layer (/v1/sys/health?standbyok=true&drsecondaryok=true) within ~9 s on DC pod scale-down.
Configuration
Source of truth: shared/helm-values/vault.yaml (chart-level) and per-cluster overlays at clusters/{dc,dr}/values/vault.yaml.
Replication tooling lives at clusters/{dc,dr}/manifests/vault-replication/ (script ConfigMap + ServiceAccount + CronJob); the vault-replicator Secret with VAULT_TOKEN + MinIO creds is applied imperatively (out of git) — re-apply after any cluster rebuild.
Replicator runtime tools (mc + musl-built jq + libonig.so.5) are pre-staged in MinIO bucket vault-replicator-tools (anonymous read), pulled by the CronJob entrypoint via in-cluster wget — avoids the slow public-mirror outbound link.
Operations
Unseal after a restart (per pod, both clusters):
for i in 0 1 2; do KEY=$(jq -r ".unseal_keys_b64[$i]" ~/cloud-init/vault-rke2-dc-init-shamir.json) kubectl -n vault exec vault-0 -- vault operator unseal "$KEY" done
Trigger an ad-hoc replication round:
kubectl -n vault create job --from=cronjob/vault-replicator-export vri-now
Verify replication ran:
mc ls lab/vault-snapshots-logical/dc/ | tail mc cat lab/vault-snapshots-logical/dc/latest.tar.gz | tar tz
Failover
Edge HAProxy auto-fails over: vault-rke2-be backend has dc as primary and dr as backup; on DC pod scale-down, traffic flips to DR within ~9 s (3× 2 s health-check intervals).
Manual promotion of DR (DC dies for real):
- Suspend DR's import CronJob so it stops overwriting live writes:
kubectl --kubeconfig rke2-dr -n vault patch cronjob vault-replicator-import -p '{"spec":{"suspend":true}}' - HAProxy already routes to DR; apps continue using
vault-rke2.apps.sub.comptech-lab.com. - After DC restored: re-init DC vault, do a one-off reverse export from DR / import to DC, resume normal direction.
Known gaps: userpass passwords are NOT replicated (the export captures user metadata but Vault does not expose password hashes via API); reset on DR after import. Identity entities, PKI, transit deferred to v0.2.
References
- GitOps:
clusters/dc/manifests/vault-replication/ - MR !39 — drop transit auto-unseal, switch to Shamir (Phase 1)
- MR !40 — logical replication DC → MinIO → DR (Phase 2)
- ADR-0016 (RKE2 DC/DR platform), ADR-0019 (wiki update on platform change)
- HashiCorp: seal migration · recovery mode