Apache Kafka (KRaft) — BRAC POC tools

Overview

Kafka is the streaming substrate for the platform. Two independent KRaft clusters (one per RKE2 cluster, three brokers + three controllers each) carry the Redis WAL (ADR-0018), MirrorMaker 2 cross-cluster replication, and any application that needs durable, ordered, replayable messaging. The two clusters know nothing about each other at the metadata layer — replication happens on top, at the topic layer, by MirrorMaker 2 running on DR.

Both clusters are managed by Strimzi (operator 0.46.1 reconciling a Kafka custom resource per cluster). External clients reach Kafka through the edge HAProxy by SNI on port 443, so the same FQDN set works against either side and clients can fail over at the TCP layer with no DNS changes.

Mode: KRaft (no ZooKeeper)
Topology per cluster: 3 broker pods + 3 controller pods (Strimzi KafkaNodePools)
Operator: Strimzi 0.46.1, watching the kafka namespace
Internal listener: scram on port 9095, TLS + SASL/SCRAM-SHA-512
External listener: Bootstrap + per-broker on port 443 via SNI passthrough on the edge HAProxy
AuthN / AuthZ: SCRAM-SHA-512 + Strimzi simple authorization (per-KafkaUser ACLs)
DR-only quirk: ANONYMOUS super-user on DR so the MirrorMaker 2 bootstrap path completes
Cross-cluster replication: MirrorMaker 2 on DR, pulling from DC
Carrier topics: redis-writes (Redis WAL — ADR-0018), jboss.* (external JBoss app)
External smoke: Validated end-to-end via kcat on 2026-05-06 (produce, consume, ACL deny)

The Kafka cluster (per side)

Topology

Each cluster is a self-contained KRaft Kafka. Strimzi reconciles two KafkaNodePools — three brokers and three controllers — into StrimziPodSets, plus the supporting Services, Secrets (cluster CA, clients CA, SCRAM credentials), and ConfigMaps. There is no ZooKeeper.

┌── one Kafka cluster (DC or DR) ─────────────────────────────┐
│                                                             │
│   ┌────────────┐  ┌────────────┐  ┌────────────┐            │
│   │ controller │  │ controller │  │ controller │  KRaft     │
│   │   pod 0    │  │   pod 1    │  │   pod 2    │  metadata  │
│   └─────┬──────┘  └─────┬──────┘  └─────┬──────┘            │
│         │               │               │                   │
│         └───────────────┼───────────────┘                   │
│                         │ Raft                              │
│   ┌────────────┐  ┌─────┴──────┐  ┌────────────┐            │
│   │   broker   │  │   broker   │  │   broker   │  topics +  │
│   │   pod 0    │  │   pod 1    │  │   pod 2    │  partitions│
│   └────────────┘  └────────────┘  └────────────┘            │
│         ▲                                                   │
│         │  in-cluster clients  (scram, :9095, TLS+SCRAM)    │
│         │                                                   │
│         │  off-cluster clients (SNI on :443 via edge — see  │
│         │                       "External access" below)    │
│                                                             │
└─────────────────────────────────────────────────────────────┘

Listeners

Two listeners are exposed from each broker:

Internal — name: scram, port 9095, TLS + SASL/SCRAM-SHA-512. Used by every in-cluster client (MirrorMaker 2, redis-applier, any namespace-local consumer). Reachability is constrained by a NetworkPolicy in the kafka namespace.
External — name: external, type ingress, container port 9094, TLS + SASL/SCRAM-SHA-512, but advertised on port 443. Per-broker host overrides give every broker its own DNS name (broker-0, broker-1, broker-2) plus a shared bootstrap name, all under kafka.apps.sub.comptech-lab.com.

The "advertised port 443" detail is what makes the external path work. Clients ask the bootstrap broker for cluster metadata, and the broker tells them which host:port to talk to next. If the advertised port doesn't match the port that's actually open through the edge, the client follows the metadata into a closed door. We hit that exact bug — clients were redirected to :9094, which the edge doesn't expose — until configuration.brokers[*].advertisedPort was changed to 443.

For the TLS handshake to reach the broker unchanged, ingress-nginx is started with --enable-ssl-passthrough=true. HAProxy never decrypts; it routes by SNI and forwards the bytes.

AuthN, AuthZ, and KafkaUsers

Authentication is SCRAM-SHA-512 on both listeners. Authorization is Strimzi simple (Kafka's StandardAuthorizer); ACLs are declared on each KafkaUser resource, not in a global ACL list, so the GitOps definition of an identity carries its permissions with it.

Today's identities (one Strimzi KafkaUser per cluster):

mm2 — used by MirrorMaker 2. Broad ACLs (it replicates whole topics).
redis-applier — Read on redis-writes + on the redis-applier consumer-group prefix. Used by the in-cluster redis-applier.
jboss-client — external JBoss application identity. SCRAM password is shared across DC and DR, delivered via a pre-created Secret that Strimzi consumes when materialising the SCRAM credential record. ACLs: Read/Write/Describe/Create on the jboss. topic prefix, Read on the jboss. consumer-group prefix, IdempotentWrite on the cluster.

DR-only quirk: an ANONYMOUS entry is configured as a super.user on the DR cluster so MirrorMaker 2's auth handshake completes during the cross-cluster bootstrap sequence. Scoped to DR; not present on DC.

Certificate hierarchy

Strimzi runs its own per-cluster certificate hierarchy: a cluster CA (signs broker server certs and the listener TLS material) and a clients CA (signs KafkaUser client certs when mTLS auth is used). Both are auto-rotated roughly every 365 days. The current pair was issued 2026-05-04 and expires 2027-05-04; external clients must refresh their truststore before that or TLS handshakes will start failing.

External access via the edge

Off-cluster clients (Java/JBoss apps on BRAC spoke networks, kcat on the lab host, anything outside the RKE2 clusters) reach Kafka through the edge HAProxy. The path is pure TCP + SNI passthrough — HAProxy never decrypts the TLS, it just reads the Server Name Indication and forwards the original handshake to ingress-nginx, which forwards it to the broker.

client                          edge HAProxy           ingress-nginx          Strimzi
                                (br30, public)         (in cluster)            broker
  │                                   │                      │                    │
  │── TLS ClientHello ───────────────▶│                      │                    │
  │   SNI: bootstrap.kafka.apps...    │                      │                    │
  │                                   │── reads SNI ────┐    │                    │
  │                                   │   matches ACL ──┘    │                    │
  │                                   │── TCP forward ──────▶│                    │
  │                                   │   to DC ingress      │                    │
  │                                   │   (DR backend in     │── ssl-passthrough ▶│
  │                                   │    "backup", only    │                    │
  │                                   │    used if DC down)  │                    │
  │                                   │                      │                    │
  │── TLS handshake reaches broker ──────────────────────────────────────────────▶│
  │── SASL/SCRAM-SHA-512 ─────────────────────────────────────────────────────────▶│
  │── Kafka protocol ─────────────────────────────────────────────────────────────▶│

HAProxy frontend & backend

The shared HTTPS frontend (public-apps-https) carries four SNI ACLs that match bootstrap.kafka.apps.sub.comptech-lab.com and broker-{0,1,2}.kafka.apps.sub.comptech-lab.com, sending matched traffic to the kafka-rke2-be backend.

mode tcp, balance roundrobin, option tcp-check.
3 DC servers active + 3 DR servers marked backup — DR only takes traffic when DC is L4-down.
Stats page shows all 6 servers L4OK in steady state.

DNS

PowerDNS holds explicit A records for bootstrap, broker-0, broker-1, and broker-2 under kafka.apps.sub.comptech-lab.com, all pointing to the edge IP. The wildcard *.apps doesn't synthesise for these nested labels (RFC 4592 — empty non-terminal), so the explicit records are required.

Cross-cluster replication (MirrorMaker 2)

Topic data continuity across clusters is provided by MirrorMaker 2, which runs as a single Strimzi-managed KafkaMirrorMaker2 resource on the DR cluster's kafka namespace. MM2 pulls from DC and applies to DR, preserving offsets via the standard mm2-offset-syncs.<source>.internal topic.

The DC cluster does not run MM2 — replication is one-directional today. If DC fails and DR takes over writes, those writes do not flow back to DC when DC recovers; reconciliation is manual.

What is replicated

redis-writes — the Redis WAL (ADR-0018). Mirrored continuously, partition keys preserved so per-key order is the same on both sides.
Not currently mirrored: jboss.*. The external JBoss app's edge auth path is failover-aware (same SCRAM creds, both CA certs trusted, edge HAProxy routes DC-primary/DR-backup), but its topic data is not. If data continuity matters for that app, add jboss\..* to the MM2 include list.

Client guidance

In-cluster clients

Use the internal listener:

Bootstrap: the Strimzi-managed bootstrap Service in the kafka namespace, port 9095.
Security: SASL_SSL + SCRAM-SHA-512. Mount the user's auto-generated KafkaUser Secret for credentials and the kafka-cluster-ca-cert Secret for the truststore.
Authorization: ACLs are declared on the KafkaUser. Don't bypass authz by adding a user to super.users — write a real ACL.

Off-cluster clients

Use the external listener via the edge HAProxy:

Bootstrap: bootstrap.kafka.apps.sub.comptech-lab.com:443.
Security: SASL_SSL + SCRAM-SHA-512.
Truststore: a combined PEM containing both cluster CAs (concatenated DC + DR ca.crt). With both CAs trusted, the same client config validates against either side on failover.
Identity: a KafkaUser with the same SCRAM password on both clusters. Pre-create a Secret holding the password and let Strimzi consume it; clients hold one password and target either side.
Refresh annually. Strimzi rotates each cluster CA roughly every 365 days. The combined PEM must be regenerated before the earlier of the two CAs expires (currently 2027-05-04).

Quick smoke test (kcat)

kcat -b bootstrap.kafka.apps.sub.comptech-lab.com:443 \
     -X security.protocol=SASL_SSL \
     -X sasl.mechanism=SCRAM-SHA-512 \
     -X sasl.username=<user> \
     -X sasl.password=<password> \
     -X ssl.ca.location=<combined-ca.pem> \
     -L

This was the validation path on 2026-05-06: produce to jboss.smoke, consume the message back, confirm an ACL-denied read of an unrelated topic returns TopicAuthorizationException.

Evaluation

Strengths

KRaft removes a moving part. No ZooKeeper to operate, deploy, secure, or fail. Metadata lives in a Raft quorum on the controller pods.
Operator-driven lifecycle. Strimzi handles broker rolling restarts, certificate rotation, KafkaUser provisioning, ACLs, topic configuration. Day-to-day Kafka admin is mostly editing GitOps YAML.
Same FQDNs work in both clusters. External clients use a single bootstrap name and per-broker names; failover happens at L4 on the edge, transparent to the client. No DNS movement, no client reconfiguration.
Single trust bundle for failover. Concatenating both cluster CAs into one PEM lets clients validate either side without per-cluster client config.
AuthZ lives with the identity. ACLs on the KafkaUser CR mean the principal and its rights are reviewed together in the same MR.
Reusable substrate. The same Kafka clusters carry the Redis WAL, the JBoss application stream, and any future producer that needs durability + replay — no new infrastructure per app.

Weaknesses & known limitations

One-way MM2. DC → DR only. If DR ever serves writes during a DC outage, those writes don't flow back when DC returns; reconciliation is manual.
ANONYMOUS super-user on DR. Unrestricted access from the broker's perspective for any unauthenticated client that reaches the listener. The NetworkPolicy and the edge HAProxy ACLs are the load-bearing safeguards; if either is bypassed, ANONYMOUS is the gap.
SCRAM password sharing across clusters. The "same password on both sides" trick that makes failover trivial also means a leak forces a two-cluster rotation, and the password lives in a static Secret rather than a managed store.
Truststore expiry is calendar-bound. The combined PEM must be regenerated before the earlier of the two cluster CAs expires. A missed reminder is a hard outage on TLS handshake day.
No off-broker observability for MM2 lag. MM2's offset-sync topic is the source of truth for how far behind DR is, but there's no SigNoz dashboard for it yet.
MM2 jboss.* exclusion is implicit. The external JBoss app's auth path is failover-aware but its data path is not. A reader who sees "external Kafka clients fail over transparently" might miss the topic-data gap.
Cross-namespace secret bridging is imperative. The cluster CA + SCRAM creds are copied from kafka to consuming namespaces by hand at bootstrap; ExternalSecrets via Vault is queued in Phase R-G.
No quotas, no retention SLO. Cluster-wide quotas (producer_byte_rate, consumer_byte_rate) are not configured per identity. A misbehaving producer can saturate a broker.

How it should be improved

Bidirectional replication. A second MM2 deployment on DC pulling from DR closes the post-DR-write recovery story. Avoid loops via topic-prefix conventions or MM2's IdentityReplicationPolicy.
Drop ANONYMOUS super-user on DR. Replace with a dedicated mm2 KafkaUser with the precise ACLs MM2 needs. Removes the catch-all and tightens audit.
Per-identity quotas. Set producer_byte_rate and consumer_byte_rate on each KafkaUser; one runaway client can no longer starve the others.
Lag SLO on MM2 + redis-applier. Wire kafka_consumer_group_lag (and the equivalent for MM2's offset-sync topic) into SigNoz; alert on minutes of lag.
CA-rotation calendar item. Codify the annual combined-PEM refresh as a scheduled GitOps job (or at minimum a calendar reminder).
ExternalSecrets via Vault. Replace the imperative secret bridging with ExternalSecrets sourced from Vault; rotate SCRAM passwords via Vault rather than static files.
Schema registry. Track an Apicurio or Confluent Schema Registry deployment alongside Kafka so producers and consumers agree on wire format; also unblocks compacted "latest state" topics for the Redis WAL.
OAuth (Keycloak) instead of SCRAM. Once Keycloak is the platform IdP, Strimzi supports oauth listener authentication. Tokens scoped per app + per topic prefix beats per-user SCRAM passwords.
Active/active per-topic. For specific topics that need bidirectional traffic, set up MM2 with active/active mirroring + topic-prefix conventions to avoid loops.

References

Strimzi operator — the Kafka operator (0.46.1).
Apache Kafka — KRaft mode.
Strimzi listener configuration — listener types, configuration.brokers[*].advertisedPort, host overrides.
ADR-0018 — Redis-via-Kafka WAL (in ~/cloud-init/adr/).
RFC 4592 — wildcard DNS records and "empty non-terminal" labels (the reason explicit A records were needed for nested broker FQDNs).
Cross-references in this catalogue:
- Strimzi operator — the operator that reconciles the Kafka CR.
- MirrorMaker 2 — DC → DR carrier.
- Redis (RedisFailover) — consumer of the WAL.
- redis-applier — Kafka → Redis consumer.
- HAProxy (edge) — the SNI router for external clients.
- PowerDNS — explicit A records for bootstrap + broker-{0,1,2}.