Skip to main content

Data and streaming

Atlas has two different data planes in this repository:

  • streaming through Amazon MSK
  • relational persistence through PostgreSQL RDS for the dashboard backend and Camunda

ClickHouse Cloud remains external to this Terraform stack, but staging can now observe one ClickHouse Cloud service through a dedicated Prometheus agent that scrapes the ClickHouse Cloud API and sends infrastructure metrics to New Relic.

terraform/staging2 does not create a second MSK cluster. It reuses the shared staging MSK cluster and its IAM + TLS broker path while keeping its own workload plane, database instances, cache, and secrets.

Kafka path

ComponentCurrent role
Events ingestion servicepublishes Atlas events to Kafka
Dashboard backendinternal Atlas client that can reach scoring over Service Connect and consume/produce against MSK when its runtime enables Kafka use
Scoring serviceinternal-only service that consumes deposit events and publishes scoring results
Amazon MSKshared Kafka backbone
Kafka UIoperator view into brokers, topics, and messages
MSK Connect S3 sinkoptional export path from Kafka topics to S3

The events secret template currently seeds:

  • KAFKA_EVENTS_TOPIC = "atlas.events.raw"
  • KAFKA_DLQ_TOPIC = "atlas.events.dlq"
  • KAFKA_SASL_MECHANISM = "AWS_MSK_IAM"
  • KAFKA_BROKERS = module.msk.bootstrap_brokers_tls

The scoring secret template currently seeds (producer-only — scoring no longer consumes any Kafka topic):

  • KAFKA_OUTPUT_TOPIC = "atlas.l3.user.score"
  • KAFKA_DLQ_TOPIC = "atlas.scoring.dlq"
  • KAFKA_SASL_MECHANISM = "AWS_MSK_IAM"
  • KAFKA_BROKERS = module.msk.bootstrap_brokers_tls
  • CAMUNDA_BASE_URL = "http://camunda:8080/engine-rest"

The dashboard backend Terraform code currently leaves this value commented for manual AWS-side addition:

  • SCORING_BASE_URL = "http://scoring:8083"

The dashboard backend ECS task role now receives MSK IAM permissions from Terraform through the same msk_cluster_arn wiring used by the other internal Kafka clients. Kafka broker addresses and any application-level topic settings remain owned by the dashboard runtime secret.

When bringing up staging2, review topic names, consumer groups, and Kafka UI display labels manually so the second workload plane does not accidentally masquerade as the original staging consumers.

MSK connectivity modes

Endpoint typeUse
bootstrap_brokers_tlsinternal IAM + TLS access for ECS services, dashboard backend, and Kafka UI
bootstrap_brokers_public_tlspublic IAM + TLS access for external clients
multi-VPC connectivityenabled in production values for private connectivity patterns

The public bootstrap endpoint is controlled by msk_enable_public_access. When it is enabled, the MSK module sets broker public access to SERVICE_PROVIDED_EIPS; the VPC module separately controls which client CIDRs can reach port 9198 through msk_public_access_cidrs.

Optional S3 export path

When the sink is enabled, Atlas provisions:

  • one bucket for exported Kafka objects
  • one plugin artifact bucket
  • one MSK Connect custom plugin
  • one MSK Connect connector
  • one connector log group

Objects are written under the configured prefix and partitioned with the selected field names.

PostgreSQL path

The dashboard backend and Camunda each have dedicated PostgreSQL instances with:

  • its own subnet groups
  • its own parameter group
  • its own security group
  • either an RDS-managed master password secret or a static password, depending on the root settings

The application-facing DATABASE_URL for the dashboard backend and the JDBC connection values for Camunda are still owned by their respective runtime secrets, not injected automatically from Terraform outputs. For existing Atlas environments, Camunda captures the current password from the RDS-managed master secret and reuses that same value as the static RDS master password so the ECS service can keep using the same credentials without background rotation.

ClickHouse Cloud metrics path

The clickhouse_prometheus_agent_enabled path provisions only observability infrastructure. It does not create or modify ClickHouse Cloud services. The agent uses the service-scoped ClickHouse Cloud Prometheus endpoint:

https://api.clickhouse.cloud/v1/organizations/<org-id>/services/<service-id>/prometheus?filtered_metrics=false

Credentials are stored in the dedicated clickhouse-prometheus-agent secret and injected into the ECS task. The ECS task runs the official Prometheus image and writes its runtime configuration from an inline task command. Metrics are sent to New Relic through Prometheus remote write, and Prometheus metric relabeling keeps only the ClickHouse and ClickPipes series required by the dashboard output from the root.

OpenSpec vs current shipped code

warning

The OpenSpec archive contains history around VPC Lattice and ClickPipes integration, but the current Terraform roots do not instantiate a dedicated VPC Lattice module. The shipped implementation today relies on public IAM + TLS outputs and environment-driven MSK connectivity settings.