Quick facts

Chart
hashicorp/vault 0.31.0
Version
Vault 1.18.2
Mode
Raft HA, 3 replicas per cluster, Shamir seal (5 shares, 3 threshold)
Hostname
vault-rke2.apps.sub.comptech-lab.com (HAProxy DC primary + DR backup)
Failover (DC→DR)
~9 s (edge HAProxy, smoke-tested 2026-05-04)
Init artefacts
~/cloud-init/vault-rke2-{dc,dr}-init-shamir.json (chmod 600)
Cross-cluster sync
logical replication: hourly export DC → MinIO bucket vault-snapshots-logical/ → 4-hourly import into DR

What it is

Each cluster runs an independent 3-node Vault Raft cluster sealed with Shamir's Secret Sharing (operators hold 5 unseal-key shares per cluster; any 3 unseal a pod). After any pod restart — rolling helm upgrade, node reboot, eviction — the operator must feed 3 shares to each pod via vault operator unseal.

The 2026-05-05 transit auto-unseal target (vault-seed VM) was decommissioned; both clusters were re-initialised on Shamir 2026-05-06 (MR !39). Vault was empty pre-migration so no data was lost.

Cross-cluster DC→DR sync is no longer Raft-snapshot ship (per-cluster Shamir keyrings can't decrypt each other's master keys). Instead, a logical replicator (MR !40) runs as CronJobs in each cluster's vault namespace, exporting policies / auth methods / secret mounts / KV-v2 data through MinIO.

Architecture

DC cluster                      MinIO                      DR cluster
──────────                      ─────                      ──────────
vault-{0,1,2}                                              vault-{0,1,2}
  ├─ Raft HA                                                 ├─ Raft HA
  └─ Shamir seal                                             └─ Shamir seal
        │                                                          │
  CronJob/vault-replicator-export                                  │
    schedule: 0 * * * *                                            │
    runs `vault export` ──tarball──▶ vault-snapshots-logical/      │
                                       dc/<ts>.tar.gz              │
                                       dc/latest.tar.gz            │
                                                                   │
                                              vault-snapshots-logical/dc/latest
                                                       ──tarball──▶
                                                                   │
                                                  CronJob/vault-replicator-import
                                                  schedule: 15 */4 * * *
                                                  replays into DR vault

Per-cluster keys stay on disk on the operator host (~/cloud-init/vault-rke2-{dc,dr}-init-shamir.json, chmod 600). Edge HAProxy vault-rke2-be backend has DC primary + DR backup; failover happens at the L7 health-check layer (/v1/sys/health?standbyok=true&drsecondaryok=true) within ~9 s on DC pod scale-down.

Configuration

Source of truth: shared/helm-values/vault.yaml (chart-level) and per-cluster overlays at clusters/{dc,dr}/values/vault.yaml.

Replication tooling lives at clusters/{dc,dr}/manifests/vault-replication/ (script ConfigMap + ServiceAccount + CronJob); the vault-replicator Secret with VAULT_TOKEN + MinIO creds is applied imperatively (out of git) — re-apply after any cluster rebuild.

Replicator runtime tools (mc + musl-built jq + libonig.so.5) are pre-staged in MinIO bucket vault-replicator-tools (anonymous read), pulled by the CronJob entrypoint via in-cluster wget — avoids the slow public-mirror outbound link.

Operations

Unseal after a restart (per pod, both clusters):

for i in 0 1 2; do
  KEY=$(jq -r ".unseal_keys_b64[$i]" ~/cloud-init/vault-rke2-dc-init-shamir.json)
  kubectl -n vault exec vault-0 -- vault operator unseal "$KEY"
done

Trigger an ad-hoc replication round:

kubectl -n vault create job --from=cronjob/vault-replicator-export vri-now

Verify replication ran:

mc ls lab/vault-snapshots-logical/dc/ | tail
mc cat lab/vault-snapshots-logical/dc/latest.tar.gz | tar tz

Failover

Edge HAProxy auto-fails over: vault-rke2-be backend has dc as primary and dr as backup; on DC pod scale-down, traffic flips to DR within ~9 s (3× 2 s health-check intervals).

Manual promotion of DR (DC dies for real):

  1. Suspend DR's import CronJob so it stops overwriting live writes:
    kubectl --kubeconfig rke2-dr -n vault patch cronjob vault-replicator-import -p '{"spec":{"suspend":true}}'
  2. HAProxy already routes to DR; apps continue using vault-rke2.apps.sub.comptech-lab.com.
  3. After DC restored: re-init DC vault, do a one-off reverse export from DR / import to DC, resume normal direction.

Known gaps: userpass passwords are NOT replicated (the export captures user metadata but Vault does not expose password hashes via API); reset on DR after import. Identity entities, PKI, transit deferred to v0.2.

References