Data and streaming
Atlas has two different data planes in this repository:
- streaming through Amazon MSK
- relational persistence through PostgreSQL RDS for the dashboard backend and Camunda
ClickHouse Cloud remains external to this Terraform stack, but staging can now observe one ClickHouse Cloud service through a dedicated Prometheus agent that scrapes the ClickHouse Cloud API and sends infrastructure metrics to New Relic.
terraform/staging2 does not create a second MSK cluster. It reuses the shared staging MSK cluster and its IAM + TLS broker path while keeping its own workload plane, database instances, cache, and secrets.
Kafka path
| Component | Current role |
|---|---|
| Events ingestion service | publishes Atlas events to Kafka |
| Dashboard backend | internal Atlas client that can reach scoring over Service Connect and consume/produce against MSK when its runtime enables Kafka use |
| Scoring service | internal-only service that consumes deposit events and publishes scoring results |
| Amazon MSK | shared Kafka backbone |
| Kafka UI | operator view into brokers, topics, and messages |
| MSK Connect S3 sink | optional export path from Kafka topics to S3 |
The events secret template currently seeds:
KAFKA_EVENTS_TOPIC = "atlas.events.raw"KAFKA_DLQ_TOPIC = "atlas.events.dlq"KAFKA_SASL_MECHANISM = "AWS_MSK_IAM"KAFKA_BROKERS = module.msk.bootstrap_brokers_tls
The scoring secret template currently seeds (producer-only — scoring no longer consumes any Kafka topic):
KAFKA_OUTPUT_TOPIC = "atlas.l3.user.score"KAFKA_DLQ_TOPIC = "atlas.scoring.dlq"KAFKA_SASL_MECHANISM = "AWS_MSK_IAM"KAFKA_BROKERS = module.msk.bootstrap_brokers_tlsCAMUNDA_BASE_URL = "http://camunda:8080/engine-rest"
The dashboard backend Terraform code currently leaves this value commented for manual AWS-side addition:
SCORING_BASE_URL = "http://scoring:8083"
The dashboard backend ECS task role now receives MSK IAM permissions from Terraform through the same msk_cluster_arn wiring used by the other internal Kafka clients. Kafka broker addresses and any application-level topic settings remain owned by the dashboard runtime secret.
When bringing up staging2, review topic names, consumer groups, and Kafka UI display labels manually so the second workload plane does not accidentally masquerade as the original staging consumers.
MSK connectivity modes
| Endpoint type | Use |
|---|---|
bootstrap_brokers_tls | internal IAM + TLS access for ECS services, dashboard backend, and Kafka UI |
bootstrap_brokers_public_tls | public IAM + TLS access for external clients |
| multi-VPC connectivity | enabled in production values for private connectivity patterns |
The public bootstrap endpoint is controlled by msk_enable_public_access. When it is enabled, the MSK module sets broker public access to SERVICE_PROVIDED_EIPS; the VPC module separately controls which client CIDRs can reach port 9198 through msk_public_access_cidrs.
Optional S3 export path
When the sink is enabled, Atlas provisions:
- one bucket for exported Kafka objects
- one plugin artifact bucket
- one MSK Connect custom plugin
- one MSK Connect connector
- one connector log group
Objects are written under the configured prefix and partitioned with the selected field names.
PostgreSQL path
The dashboard backend and Camunda each have dedicated PostgreSQL instances with:
- its own subnet groups
- its own parameter group
- its own security group
- either an RDS-managed master password secret or a static password, depending on the root settings
The application-facing DATABASE_URL for the dashboard backend and the JDBC connection values for Camunda are still owned by their respective runtime secrets, not injected automatically from Terraform outputs. For existing Atlas environments, Camunda captures the current password from the RDS-managed master secret and reuses that same value as the static RDS master password so the ECS service can keep using the same credentials without background rotation.
ClickHouse Cloud metrics path
The clickhouse_prometheus_agent_enabled path provisions only observability infrastructure. It does not create or modify ClickHouse Cloud services. The agent uses the service-scoped ClickHouse Cloud Prometheus endpoint:
https://api.clickhouse.cloud/v1/organizations/<org-id>/services/<service-id>/prometheus?filtered_metrics=false
Credentials are stored in the dedicated clickhouse-prometheus-agent secret and injected into the ECS task. The ECS task runs the official Prometheus image and writes its runtime configuration from an inline task command. Metrics are sent to New Relic through Prometheus remote write, and Prometheus metric relabeling keeps only the ClickHouse and ClickPipes series required by the dashboard output from the root.
OpenSpec vs current shipped code
The OpenSpec archive contains history around VPC Lattice and ClickPipes integration, but the current Terraform roots do not instantiate a dedicated VPC Lattice module. The shipped implementation today relies on public IAM + TLS outputs and environment-driven MSK connectivity settings.