Environment model
Atlas uses three Terraform roots with the same file structure and mostly the same module graph:
terraform/stagingterraform/staging2terraform/prod
The design goal is parity by structure and divergence by inputs, with one intentional shared-foundation exception for staging2.
Root comparison
| Concern | staging root | staging2 root | prod root |
|---|---|---|---|
| Folder | terraform/staging | terraform/staging2 | terraform/prod |
| Current naming defaults | project_name = "poc-atlas", environment = "dev" | project_name = "poc-atlas", environment = "dev" with duplicated workload resources suffixed by 2 | project_name = "atlas", environment = "prod" |
| State key | staging/terraform.tfstate | staging2/terraform.tfstate | prod/terraform.tfstate |
| Foundation ownership | owns its own VPC, subnets, security groups, MSK, and application ECR repositories | reuses the staging VPC, subnets, approved security groups, MSK, and application ECR repositories | owns its own independent production foundation |
| ALB and ECS plane | first staging workload plane | second staging workload plane with its own ALB and ECS cluster | independent production workload plane |
| MSK placement | msk_subnet_type = "public" in example values | reuses the staging MSK placement | msk_subnet_type = "public" in committed prod values while public access is enabled |
| MSK connectivity | msk_enable_public_access = true, msk_enable_multi_vpc_connectivity = false | reuses the staging shared IAM + TLS path on 9098 | msk_enable_public_access = true, msk_enable_multi_vpc_connectivity = true |
| RDS placement | dashboard and Camunda RDS use public subnet groups and public accessibility | its duplicated RDS instances attach to the shared staging subnets and shared approved security groups | dashboard and Camunda RDS use private subnet groups and no public accessibility |
| RDS durability | single-AZ, easy teardown defaults | same durability defaults as staging unless overridden in terraform/staging2/terraform.tfvars | Multi-AZ and deletion protection enabled |
| Scheduled operations | enabled: start weekdays at 08:00, stop daily at 20:00, cleanup Lambda kept for manual use; DevOps Agent investigation Lambda exists only as test wiring | not provisioned from this root | disabled by default |
| Image registry model | owns the shared staging application ECR repositories | reuses the same staging application ECR repositories and allows base task definitions to stay stale | owns its own production application ECR repositories |
| Log retention | 1 day defaults in active staging values | same 1 day defaults unless the root values override them | 3 day application and collector values in production, with 7 days for the events New Relic sidecar |
What stays aligned
- Both roots use the same shared modules under
terraform/modules. - All roots use the same shared modules under
terraform/modules. - All roots provision the same application functional areas: ALB, ECS services, RDS, secrets, internal service discovery, and monitoring.
- The operational pattern stays the same: bootstrap backend, initialize the root, make sure the correct application images already exist, apply infrastructure, then populate secrets.
What changes by environment
- Naming prefixes and hostnames
- ACM certificate IDs
- Network shape and allowed CIDRs
- MSK broker class and connectivity mode
- RDS placement, durability, and access
- Log retention and operational hardening
- Whether the root owns the foundation directly or consumes shared outputs from another root
note
The staging directory name reflects the environment role, while the default input values inside that root still preserve the older poc-atlas-dev naming convention. Treat that as current reality, not as an inconsistency to paper over in docs.