Inputs and variables
The root inputs live in terraform/staging/variables.tf, terraform/staging2/variables.tf, and terraform/prod/variables.tf. All three roots share the same interface, but staging2 intentionally reuses the VPC, public/private subnets, security groups, and MSK cluster already created by staging.
Identity and ownership
| Variable | Purpose | Current shape |
|---|---|---|
aws_region | target AWS region | us-east-1 in both roots |
project_name | naming and tagging prefix | poc-atlas in staging example, atlas in prod |
environment | environment tag and naming suffix | dev in staging example, prod in prod |
owner_email | alert subscription target and ownership tag | required in both roots |
Ingress and hostnames
| Variable | Purpose | Current shape |
|---|---|---|
alb_certificate_id | ACM certificate for the ALB HTTPS listener | 140dbe42-48a0-467b-bbfd-1c0d69c5a492 in staging, fc539f7b-c3ab-4e90-882f-82392fc8b7a3 in prod |
alb_ingress_cidrs | client allow-list for ALB HTTP/HTTPS | 0.0.0.0/0 in current staging example and prod values |
alb_target_group_deregistration_delay_seconds | connection draining delay shared by all ALB-managed target groups | 10 seconds in staging, 300 seconds in prod |
events_ingestion_host | host routed to the events API | atlas-ingest.twinfo.io in staging, atlas-ingest.lifters.tech in prod |
dashboard_backend_host | host routed to the dashboard backend | atlas-back.twinfo.io in staging, atlas-back.lifters.tech in prod |
kafka_ui_host | host routed to Kafka UI | atlas-kafka.twinfo.io in staging, atlas-kafka.lifters.tech in prod |
*_listener_priority | ALB host-header rule priority | events 1, dashboard 2, Kafka UI 3 |
For staging2, the committed example uses atlas-ingest2.twinfo.io, atlas-back2.twinfo.io, and atlas-kafka2.twinfo.io with listener priorities 11, 12, and 13.
Networking and environment shape
| Variable | Purpose | Current shape |
|---|---|---|
vpc_cidr_block | primary VPC CIDR | 10.0.0.0/16 in staging example, 10.20.0.0/16 in prod |
private_vpc_peering_routes | extra routes on each private route table | empty in staging example, defined in prod |
vpc_flow_logs_retention_days | retention for /vpc/flow-logs | 1 day in staging example, 3 days in prod |
vpc_flow_logs_traffic_type | traffic type captured by /vpc/flow-logs | REJECT in both committed roots |
elasticache_valkey_replication_group_id | ElastiCache Valkey replication group identifier | atlas-redis-dev in committed staging values, atlas-redis in committed prod values |
elasticache_valkey_subnet_group_name | ElastiCache Valkey subnet group name | atlas-elasticache-private-dev in committed staging values, atlas-elasticache-private in committed prod values |
elasticache_valkey_engine_version | Valkey engine version | 8.2 |
elasticache_valkey_node_type | ElastiCache node type | cache.t3.medium |
elasticache_valkey_number_of_replicas | replicas used by the cluster-mode-disabled replication group | 1 in both committed roots |
elasticache_valkey_snapshot_retention_limit | automatic snapshot retention in days | 1 in both committed roots |
elasticache_valkey_auto_minor_version_upgrade | minor-version auto upgrade toggle | false in both committed roots |
elasticache_valkey_multi_az_enabled | Multi-AZ toggle for the cache | true in both committed roots |
elasticache_valkey_transit_encryption_enabled | in-transit encryption toggle | false in both committed roots |
elasticache_valkey_at_rest_encryption_enabled | at-rest encryption toggle | false in both committed roots |
staging2 does not create a second VPC or a second MSK cluster. It creates a second workload plane and appends 2 to the duplicated resource identifiers while attaching those resources to the shared staging VPC, shared subnets, and shared security groups.
Events and image bootstrap
| Variable | Purpose | Current shape |
|---|---|---|
events_service_name_suffix | events ECS service suffix | events-ingestion |
events_ecr_repository_suffix | events ECR repository suffix | events-ingestion |
events_service_desired_count | steady-state desired count for the events ECS service | 1 in both roots |
events_container_insights_setting | Container Insights mode for the shared ECS cluster | enabled in both committed roots |
events_task_cpu | task-level CPU for the events workload | 1024 in the committed roots |
events_task_memory | task-level memory for the events workload | 2048 MiB in the committed roots |
events_app_container_cpu | CPU reserved for the events app container | 768 in the committed roots |
events_app_container_memory_reservation | memory reservation for the events app container | 1024 MiB in the committed roots |
events_enable_newrelic_sidecar | enables the optional newrelic-infra sidecar on the events task | true in both committed roots |
events_newrelic_sidecar_image | pinned sidecar image reference | newrelic/nri-ecs:1.13.9 |
events_newrelic_sidecar_cpu | CPU reserved for the New Relic sidecar | 256 |
events_newrelic_sidecar_memory_reservation | memory reservation for the New Relic sidecar | 512 MiB |
dashboard_backend_desired_count | steady-state desired count for the dashboard backend ECS service | 1 in both roots |
scoring_desired_count | steady-state desired count for the scoring ECS service | 1 in both roots |
camunda_desired_count | steady-state desired count for the Camunda ECS service | 1 in both roots |
kafka_ui_desired_count | steady-state desired count for the Kafka UI ECS service | 1 in both roots |
events_log_retention_days | events service log retention | 1 day in staging example, 3 days in prod |
events_newrelic_log_retention_days | events New Relic sidecar log retention | 1 day in staging example, 7 days in prod |
dashboard_backend_log_retention_days | dashboard service log retention | 1 day in staging example, 3 days in prod |
scoring_log_retention_days | scoring service log retention | 1 day in staging example, 3 days in prod |
camunda_log_retention_days | Camunda service log retention | 1 day in staging example, 3 days in prod |
kafka_ui_log_retention_days | Kafka UI log retention | 1 day in staging example, 3 days in prod |
camunda_image | upstream Camunda image reference | camunda/camunda-bpm-platform:7.22.0 |
camunda_cpu_architecture | Camunda task architecture override | X86_64 in current committed values |
clickhouse_prometheus_agent_enabled | enables the dedicated ECS Prometheus agent for ClickHouse Cloud metrics | true in staging and prod |
clickhouse_prometheus_agent_image | Prometheus image used by the collector task | prom/prometheus:v3.11.2 |
clickhouse_prometheus_agent_desired_count | desired ECS task count for the collector | 0 in both committed roots until the collector should actively run |
clickhouse_prometheus_agent_cpu | CPU units for the collector task | 256 |
clickhouse_prometheus_agent_memory | memory in MiB for the collector task | 512 |
clickhouse_prometheus_agent_log_retention_days | collector CloudWatch log retention | 1 day in staging, 3 days in prod |
clickhouse_prometheus_agent_scrape_interval | ClickHouse Cloud Prometheus scrape interval | 60s |
events_newrelic_dashboard_enabled | enables management of the reusable New Relic ECS service dashboard | true in prod; staging uses the variable default unless overridden |
events_newrelic_dashboard_name | dashboard name used for the events ingestion service dashboard | environment-specific; staging defaults to Atlas staging events dashboard |
events_newrelic_dashboard_page_name | page name used inside the events ingestion service dashboard | environment-specific; staging defaults to Atlas staging dashboard |
events_newrelic_apm_entity_guid | APM entity GUID used by throughput/error widgets in the events service dashboard | prod is pinned in committed values; staging is supplied when desired |
ecs_service_newrelic_dashboards_enabled | enables New Relic dashboards for dashboard backend, scoring, Camunda, and Kafka UI from AWS CloudWatch metrics | true in committed roots |
dashboard_backend_newrelic_apm_entity_guid | optional APM entity GUID that adds backend throughput/error widgets to the dashboard backend ECS dashboard | empty by default |
rds_newrelic_dashboard_enabled | enables the shared RDS New Relic dashboard for dashboard and Camunda databases | true in committed roots |
msk_newrelic_dashboard_enabled | enables the MSK New Relic dashboard | true in committed roots |
New Relic AWS account integration
| Variable | Purpose | Current shape |
|---|---|---|
newrelic_account_id | target New Relic account ID for the AWS linked account | 7848378 in committed roots |
newrelic_region | New Relic account region | US in committed roots |
newrelic_api_key | New Relic user API key used by the Terraform provider | supplied externally via TF_VAR_newrelic_api_key |
newrelic_aws_integration_enabled | enables the AWS pull integration module in the target root | true in both committed roots |
newrelic_aws_linked_account_name | display name for the linked AWS account inside New Relic | environment-specific, derived from the root prefix in committed values |
newrelic_aws_trusted_account_id | AWS account ID used by New Relic to assume the integration role | 754728514883 in committed roots |
newrelic_aws_regions | AWS regions covered by the pull integrations | ["us-east-1"] in committed roots |
MSK and sink controls
| Variable | Purpose | Current shape |
|---|---|---|
msk_broker_instance_type | Kafka broker class | kafka.m7g.xlarge in active staging values, kafka.m5.large in prod |
msk_ebs_volume_size_gib | EBS storage attached to each broker | 500 GiB in active staging values, 20 GiB in prod |
msk_cloudwatch_enhanced_monitoring | CloudWatch enhanced monitoring level for MSK | DEFAULT in examples; PER_BROKER in active staging/prod values |
msk_enable_multi_vpc_connectivity | enables multi-VPC connectivity support | false in staging example, true in prod |
msk_enable_public_access | enables public broker endpoints with service-provided EIPs | true in both committed roots |
msk_subnet_type | public or private broker placement | public in both committed roots while public access is enabled |
msk_public_access_cidrs | CIDRs allowed to reach public IAM + TLS on 9198 | open in examples; restrict before apply when source ranges are known |
enable_msk_s3_sink | enables optional MSK Connect to S3 | enabled in the committed examples |
create_msk_connect_plugin_bucket | creates the plugin bucket | enabled |
msk_s3_sink_plugin_file_key | required ZIP object key when sink is on | set only after upload |
msk_s3_sink_topics_regex | topic selector regex | atlas\\.events\\..* |
msk_s3_sink_partition_fields | S3 partitioning fields | organization_id, brand_id |
msk_cleanup_topic_name_prefixes | prefixes used by the cleanup Lambda to discover ephemeral topics | ["atlas.events."] in both roots |
msk_cleanup_topics | explicit topic definitions recreated by the cleanup Lambda after discovery-based cleanup | atlas.events.raw and atlas.events.dlq by default |
msk_disk_usage_critical_threshold | critical threshold for native KafkaDataLogsDiskUsed alarms | 80 in the committed roots |
msk_cpu_user_high_threshold | threshold for native CpuUser broker alarms | 60 in the committed roots |
msk_memory_available_threshold_gib | low estimated available-memory threshold, converted to bytes against MemoryFree + MemoryCached + MemoryBuffered | 2 GiB in the committed roots |
msk_swap_used_threshold_bytes | threshold for native SwapUsed broker alarms | 1 byte in the committed roots, alerting on any swap usage |
Monitoring controls
| Variable | Purpose | Current shape |
|---|---|---|
monitoring_slack_notifications_enabled | creates the dedicated Slack SNS topic and Slack notifier Lambda | true in staging, false in prod |
monitoring_slack_webhook_url | Slack incoming webhook URL supplied outside Git when Slack delivery is enabled | unset in the committed roots |
monitoring_slack_log_retention_days | log retention for the dedicated Slack notifier Lambda | 7 days in the committed roots |
ecs_cpu_utilization_high_threshold | threshold for ECS CPUUtilization alarms | 80 in the committed roots |
ecs_memory_utilization_high_threshold | threshold for ECS MemoryUtilization alarms | 80 in the committed roots |
rds_cpu_utilization_high_threshold | threshold for RDS CPUUtilization alarms | 80 in the committed roots |
rds_db_load_high_threshold | threshold for RDS DBLoad alarms | 4 in the committed roots |
rds_free_storage_space_low_threshold_gib | low free-storage threshold for RDS FreeStorageSpace, converted to bytes for the native metric alarm | 5 GiB in the committed roots |
Scheduled environment operations
| Variable | Purpose | Current shape |
|---|---|---|
environment_operations_enabled | provisions the environment operation Lambdas and Scheduler resources | true in staging, false in prod |
environment_operations_schedules_enabled | keeps the provisioned EventBridge schedules enabled instead of suspended | false in both roots |
environment_operations_timezone | local timezone for the schedules | America/Fortaleza |
environment_shutdown_time | daily local shutdown time | 20:00 |
environment_start_time | weekday local startup time | 08:00 |
environment_cleanup_time | daily local cleanup time for Kafka topic recreation | 20:30 |
environment_operations_log_retention_days | log retention for the operation Lambdas | 7 days in staging example, 3 days in prod |
environment_operations_alarm_investigation_enabled | test-only toggle for the SNS-driven DevOps Agent investigation Lambda inside environment-operations | test wiring only; not an implemented operational alerting path |
environment_operations_alarm_investigation_webhook_url | DevOps Agent generic webhook URL passed to the test Lambda environment | test value in staging values, unset in prod |
environment_operations_alarm_investigation_webhook_secret | DevOps Agent generic webhook HMAC secret passed to the test Lambda environment | test value in staging values, unset in prod |
Dashboard database controls
| Variable | Purpose | Current shape |
|---|---|---|
dashboard_db_identifier | DB instance identifier suffix | rds-atlas-dashboard in staging example, rds in prod |
dashboard_db_name | initial database name | atlas_dashboard |
dashboard_db_instance_class | RDS instance size | db.t4g.micro in staging example, db.t3.medium in prod |
dashboard_db_subnet_group_type | public or private subnet group | public in staging example, private in prod |
dashboard_db_publicly_accessible | public accessibility flag | true in staging example, false in prod |
dashboard_db_multi_az | Multi-AZ toggle | false in staging example, true in prod |
dashboard_db_allowed_cidr_blocks | CIDRs allowed to reach PostgreSQL | open in staging example, constrained in prod |
dashboard_db_deletion_protection | destroy protection | false in staging example, true in prod |
dashboard_db_skip_final_snapshot | destroy-time snapshot behavior | true in staging example, false in prod |
Camunda database controls
| Variable | Purpose | Current shape |
|---|---|---|
camunda_db_identifier | DB instance identifier suffix | rds-atlas-camunda |
camunda_db_name | initial database name | camunda |
camunda_db_instance_class | RDS instance size | db.t4g.micro in both roots today |
camunda_db_subnet_group_type | public or private subnet group | public in staging example, private in prod |
camunda_db_publicly_accessible | public accessibility flag | true in staging example, false in prod |
camunda_db_multi_az | Multi-AZ toggle | false in staging example, true in prod |
camunda_db_allowed_cidr_blocks | CIDRs allowed to reach PostgreSQL | empty by default; application path is the main access model |
camunda_db_deletion_protection | destroy protection | false in staging example, true in prod |
camunda_db_skip_final_snapshot | destroy-time snapshot behavior | true in staging example, false in prod |
When in doubt, compare terraform/staging/terraform.tfvars.example with terraform/prod/production.auto.tfvars. That gives the clearest view of how Atlas wants environment parity with different safety and access postures.