Overview

Kafka is the streaming substrate for the platform. Two independent KRaft clusters (one per RKE2 cluster, three brokers + three controllers each) carry application messaging, cross-cluster replication, and any workload that needs durable, ordered, replayable streams. The two clusters know nothing about each other at the metadata layer — replication happens on top, at the topic layer, by MirrorMaker 2 running on DR.

Both clusters are managed by Strimzi (operator 0.46.1 reconciling a Kafka custom resource per cluster). External clients reach Kafka through the edge HAProxy by SNI on port 443, so the same FQDN set works against either side and clients can fail over at the TCP layer with no DNS changes.

Mode
KRaft (no ZooKeeper)
Topology per cluster
3 broker pods + 3 controller pods (Strimzi KafkaNodePools)
Operator
Strimzi 0.46.1, watching the kafka namespace
Internal listener
scram on port 9095, TLS + SASL/SCRAM-SHA-512
External listener
Bootstrap + per-broker on port 443 via SNI passthrough on the edge HAProxy
AuthN / AuthZ
SCRAM-SHA-512 + Strimzi simple authorization (per-KafkaUser ACLs)
DR-only quirk
ANONYMOUS super-user on DR so the MirrorMaker 2 bootstrap path completes
Cross-cluster replication
MirrorMaker 2 on DR, pulling from DC
External smoke
Validated end-to-end via kcat on 2026-05-06 (produce, consume, ACL deny)

The Kafka cluster (per side)

Topology

Each cluster is a self-contained KRaft Kafka. Strimzi reconciles two KafkaNodePools — three brokers and three controllers — into StrimziPodSets, plus the supporting Services, Secrets (cluster CA, clients CA, SCRAM credentials), and ConfigMaps. There is no ZooKeeper.

┌── one Kafka cluster (DC or DR) ─────────────────────────────┐
│                                                             │
│   ┌────────────┐  ┌────────────┐  ┌────────────┐            │
│   │ controller │  │ controller │  │ controller │  KRaft     │
│   │   pod 0    │  │   pod 1    │  │   pod 2    │  metadata  │
│   └─────┬──────┘  └─────┬──────┘  └─────┬──────┘            │
│         │               │               │                   │
│         └───────────────┼───────────────┘                   │
│                         │ Raft                              │
│   ┌────────────┐  ┌─────┴──────┐  ┌────────────┐            │
│   │   broker   │  │   broker   │  │   broker   │  topics +  │
│   │   pod 0    │  │   pod 1    │  │   pod 2    │  partitions│
│   └────────────┘  └────────────┘  └────────────┘            │
│         ▲                                                   │
│         │  in-cluster clients  (scram, :9095, TLS+SCRAM)    │
│         │                                                   │
│         │  off-cluster clients (SNI on :443 via edge — see  │
│         │                       "External access" below)    │
│                                                             │
└─────────────────────────────────────────────────────────────┘

Listeners

Two listeners are exposed from each broker:

The "advertised port 443" detail is what makes the external path work. Clients ask the bootstrap broker for cluster metadata, and the broker tells them which host:port to talk to next. If the advertised port doesn't match the port that's actually open through the edge, the client follows the metadata into a closed door. We hit that exact bug — clients were redirected to :9094, which the edge doesn't expose — until configuration.brokers[*].advertisedPort was changed to 443.

For the TLS handshake to reach the broker unchanged, ingress-nginx is started with --enable-ssl-passthrough=true. HAProxy never decrypts; it routes by SNI and forwards the bytes.

AuthN, AuthZ, and KafkaUsers

Authentication is SCRAM-SHA-512 on both listeners. Authorization is Strimzi simple (Kafka's StandardAuthorizer); ACLs are declared on each KafkaUser resource, not in a global ACL list, so the GitOps definition of an identity carries its permissions with it.

Identities are minimal — one Strimzi KafkaUser per cluster, deployed via GitOps. Each user's ACLs live on the same CR as the user, so the principal and its rights are reviewed together. Two patterns are in use today:

DR-only quirk: an ANONYMOUS entry is configured as a super.user on the DR cluster so MirrorMaker 2's auth handshake completes during the cross-cluster bootstrap sequence. Scoped to DR; not present on DC.

Certificate hierarchy

Strimzi runs its own per-cluster certificate hierarchy: a cluster CA (signs broker server certs and the listener TLS material) and a clients CA (signs KafkaUser client certs when mTLS auth is used). Both are auto-rotated roughly every 365 days. The current pair was issued 2026-05-04 and expires 2027-05-04; external clients must refresh their truststore before that or TLS handshakes will start failing.

External access via the edge

Off-cluster clients (Java/JBoss apps on BRAC spoke networks, kcat on the lab host, anything outside the RKE2 clusters) reach Kafka through the edge HAProxy. The path is pure TCP + SNI passthrough — HAProxy never decrypts the TLS, it just reads the Server Name Indication and forwards the original handshake to ingress-nginx, which forwards it to the broker.

client                          edge HAProxy           ingress-nginx          Strimzi
                                (br30, public)         (in cluster)            broker
  │                                   │                      │                    │
  │── TLS ClientHello ───────────────▶│                      │                    │
  │   SNI: bootstrap.kafka.apps...    │                      │                    │
  │                                   │── reads SNI ────┐    │                    │
  │                                   │   matches ACL ──┘    │                    │
  │                                   │── TCP forward ──────▶│                    │
  │                                   │   to DC ingress      │                    │
  │                                   │   (DR backend in     │── ssl-passthrough ▶│
  │                                   │    "backup", only    │                    │
  │                                   │    used if DC down)  │                    │
  │                                   │                      │                    │
  │── TLS handshake reaches broker ──────────────────────────────────────────────▶│
  │── SASL/SCRAM-SHA-512 ─────────────────────────────────────────────────────────▶│
  │── Kafka protocol ─────────────────────────────────────────────────────────────▶│

HAProxy frontend & backend

The shared HTTPS frontend (public-apps-https) carries four SNI ACLs that match bootstrap.kafka.apps.sub.comptech-lab.com and broker-{0,1,2}.kafka.apps.sub.comptech-lab.com, sending matched traffic to the kafka-rke2-be backend.

DNS

PowerDNS holds explicit A records for bootstrap, broker-0, broker-1, and broker-2 under kafka.apps.sub.comptech-lab.com, all pointing to the edge IP. The wildcard *.apps doesn't synthesise for these nested labels (RFC 4592 — empty non-terminal), so the explicit records are required.

Cross-cluster replication (MirrorMaker 2)

Topic data continuity across clusters is provided by MirrorMaker 2, which runs as a single Strimzi-managed KafkaMirrorMaker2 resource on the DR cluster's kafka namespace. MM2 pulls from DC and applies to DR, preserving offsets via the standard mm2-offset-syncs.<source>.internal topic.

The DC cluster does not run MM2 — replication is one-directional today. If DC fails and DR takes over writes, those writes do not flow back to DC when DC recovers; reconciliation is manual.

What is replicated

The MM2 include list explicitly enumerates the topics that cross the boundary; partition keys are preserved so per-key order is the same on both sides. Adding a topic to replication is a one-line MR against clusters/dr/manifests/kafka/mirrormaker2.yaml.

Client guidance

In-cluster clients

Use the internal listener:

Off-cluster clients

Use the external listener via the edge HAProxy:

Quick smoke test (kcat)

kcat -b bootstrap.kafka.apps.sub.comptech-lab.com:443 \
     -X security.protocol=SASL_SSL \
     -X sasl.mechanism=SCRAM-SHA-512 \
     -X sasl.username=<user> \
     -X sasl.password=<password> \
     -X ssl.ca.location=<combined-ca.pem> \
     -L

This was the validation path on 2026-05-06: produce to jboss.smoke, consume the message back, confirm an ACL-denied read of an unrelated topic returns TopicAuthorizationException.

Evaluation

Strengths

Weaknesses & known limitations

How it should be improved

References