Network and security

Atlas uses one shared VPC per full environment root, split across two availability zones with explicit security group boundaries between the edge, workloads, database, and Kafka. The current exception is terraform/staging2, which reuses the VPC, subnets, and approved security groups already owned by terraform/staging instead of creating a second network foundation.

Network shape

Concern	Current implementation
Availability zones	2 dynamically selected AZs
Subnets	2 public and 2 private subnets
Internet egress	1 NAT gateway per public subnet
Private S3 path	S3 gateway VPC endpoint attached to private route tables
Flow logs	VPC Flow Logs to CloudWatch for rejected traffic in the committed roots

staging2 reuses the same two public and two private subnets from staging. It does not create a second NAT, route-table set, or VPC endpoint path.

Security group model

Security group	Allows
`alb-sg`	inbound 80 and 443 from `alb_ingress_cidrs`, outbound unrestricted
`ecs-sg`	inbound 8080 only from `alb-sg`, outbound unrestricted
dashboard backend SG	inbound target port from `alb-sg`, outbound unrestricted; attached together with the shared `ecs-sg` for common internal egress paths such as MSK
scoring SG	inbound 8083 from `alb-sg` and dashboard backend SG, outbound unrestricted
Camunda SG	inbound 8080 only from scoring SG, outbound unrestricted
Valkey SG	inbound 6379 from the shared `ecs-sg` plus the dashboard backend, scoring, and Camunda SGs; outbound unrestricted
ClickHouse Prometheus agent SG	no inbound, outbound unrestricted for HTTPS egress to ClickHouse Cloud, New Relic, and AWS service endpoints through NAT
`msk-connect-sg`	no inbound, outbound unrestricted for connector workers
`msk-sg`	inbound 9098 from `ecs-sg` and `msk-connect-sg`, inbound 9198 from `msk_public_access_cidrs`
RDS SG	inbound 5432 from allowed CIDRs plus the dashboard backend or Camunda SG

staging2 attaches its duplicated workloads to these same security-group IDs. That means no *-sg2 copies exist for the shared edge, ECS, MSK, service, cache, or database paths.

Ingress model

The ALB is internet-facing and sits in public subnets.
Port 80 redirects to 443.
The default HTTPS listener action forwards to the events ingestion target group.
Additional listener rules route the dashboard backend, scoring service, and Kafka UI by hostname.
Internal dashboard-backend-to-scoring traffic uses ECS Service Connect with the scoring alias instead of the public scoring hostname.
Internal scoring-to-Camunda traffic uses ECS Service Connect with the camunda client alias instead of the public ALB hostname.
Internal ECS workloads reach Valkey through the dedicated cache security group on 6379.
The Camunda Service Connect config sets per_request_timeout_seconds = 30 so the scoring worker long-poll request can exceed the AWS HTTP default safely.
The ClickHouse Prometheus agent has no inbound path and reaches api.clickhouse.cloud plus New Relic remote write over outbound HTTPS through private-subnet NAT.

Current exposure notes

ALB exposure

Both the staging example values and the committed production values currently allow alb_ingress_cidrs = ["0.0.0.0/0"]. The module supports a tighter allow-list, but the current committed state is wide open at the edge.

MSK public access

The MSK module enables service-provided public broker EIPs when msk_enable_public_access = true. The VPC module separately exposes port 9198 from msk-sg to msk_public_access_cidrs, which controls which external client CIDRs can use the public IAM + TLS path.

Dashboard database access

The staging example keeps RDS on a public subnet group with publicly_accessible = true and open CIDR defaults. Production committed values move the database to private subnets and disable public accessibility.

Operational implications

ECS tasks stay in private subnets with assign_public_ip = false.
The ElastiCache Valkey subnet group also stays in the private subnets and is not exposed publicly.
Hostname routing is managed at the ALB listener level, not inside a separate ingress service.
The dashboard backend joins the same Service Connect namespace as a client, but remains publicly reachable only through the ALB route.
Security hardening happens primarily through input values, not by changing the root module graph.
terraform/staging must remain the owner of the shared staging foundation outputs that terraform/staging2 consumes.

note

Some older OpenSpec pages describe stricter or different networking assumptions. The current Terraform code is the source of truth for what Atlas actually provisions today.

Network shape​

Security group model​

Ingress model​

Current exposure notes​

Operational implications​

Network shape

Security group model

Ingress model

Current exposure notes

Operational implications