SigNoz
OpenTelemetry-native APM — no-ZooKeeper, single-node ClickHouse profile per cluster.
Quick facts
What it is
Used for traces and metrics from in-cluster workloads. The cosmetic OutOfSync in Argo CD is the K8s API-server adding default fields (schedulerName, dnsPolicy, terminationGracePeriodSeconds, etc.) that aren't in the source manifests; functionally the install is healthy.
Architecture
Per-cluster install. Components: frontend (the SigNoz web UI, Apache nginx serving the React SPA), query-service (Go backend reading from ClickHouse), otel-collector + otel-collector-metrics (OpenTelemetry receivers + processors writing into ClickHouse), and a single-node clickhouse StatefulSet on Longhorn.
The "no ZK" profile drops Zookeeper that the older SigNoz layouts required for ClickHouse coordination — single-node ClickHouse doesn't need it. We had a previous deploy with ZK that left orphan ConfigMaps; Argo's diff against those is the source of the cosmetic OutOfSync (MR #7-#8).
Configuration
Source: clusters/<cluster>/manifests/signoz/ — raw manifests adapted from the SigNoz Helm chart values, simplified for the lab's single-node ClickHouse.
Receivers exposed: OTLP gRPC (4317), OTLP HTTP (4318), Jaeger Thrift (14268). Apps in the cluster send traces and metrics by setting OTEL_EXPORTER_OTLP_ENDPOINT at http://otel-collector.signoz:4317.
No OIDC integration today — the SigNoz UI uses its own user database. Auth-via-Keycloak is on the wishlist.
Operations
- UI:
https://signoz.apps.sub.comptech-lab.comon whichever cluster is fronted; per-cluster instances with no DC-primary/DR-backup edge backend yet. - ClickHouse direct query:
kubectl -n signoz exec clickhouse-0 -- clickhouse-client --query "SHOW TABLES FROM signoz_traces" - Inspect collector receiver health:
kubectl -n signoz logs deploy/otel-collector | head - Send a test trace:
otelgenor any otel-cli pointed at the OTLP endpoint inside the cluster. - Cosmetic OOS: ignore until Phase R-E; functional impact is zero. Either add a global
resource.customizations.ignoreDifferencesinargocd-cmfor StatefulSet/Deployment/Pod default fields, OR per-appignoreDifferencesblocks. MR #37/#38 attempted server-side diff fixes that didn't fully clear the badges.
Failover
Per-cluster — no cross-cluster trace/metric replication. If DC dies, the trace history living in DC's ClickHouse goes with it (until VM-level recovery). DR's SigNoz only has traces from DR-side workloads.
No edge HAProxy backend wired. Plan-09 candidate.
References
- GitOps:
clusters/dc/manifests/signoz/ - MR #7-#8 (no-ZK rollout)
- ADR-0013 (SigNoz on spoke-dc — historical RHACM context, unrelated to RKE2 lab)
- SigNoz docs · OpenTelemetry