Inputs and variables

The root inputs live in terraform/staging/variables.tf, terraform/staging2/variables.tf, and terraform/prod/variables.tf. All three roots share the same interface, but staging2 intentionally reuses the VPC, public/private subnets, security groups, and MSK cluster already created by staging.

Identity and ownership

Variable	Purpose	Current shape
`aws_region`	target AWS region	`us-east-1` in both roots
`project_name`	naming and tagging prefix	`poc-atlas` in staging example, `atlas` in prod
`environment`	environment tag and naming suffix	`dev` in staging example, `prod` in prod
`owner_email`	alert subscription target and ownership tag	required in both roots

Ingress and hostnames

Variable	Purpose	Current shape
`alb_certificate_id`	ACM certificate for the ALB HTTPS listener	`140dbe42-48a0-467b-bbfd-1c0d69c5a492` in staging, `fc539f7b-c3ab-4e90-882f-82392fc8b7a3` in prod
`alb_ingress_cidrs`	client allow-list for ALB HTTP/HTTPS	`0.0.0.0/0` in current staging example and prod values
`alb_target_group_deregistration_delay_seconds`	connection draining delay shared by all ALB-managed target groups	`10` seconds in staging, `300` seconds in prod
`events_ingestion_host`	host routed to the events API	`atlas-ingest.twinfo.io` in staging, `atlas-ingest.lifters.tech` in prod
`dashboard_backend_host`	host routed to the dashboard backend	`atlas-back.twinfo.io` in staging, `atlas-back.lifters.tech` in prod
`kafka_ui_host`	host routed to Kafka UI	`atlas-kafka.twinfo.io` in staging, `atlas-kafka.lifters.tech` in prod
`*_listener_priority`	ALB host-header rule priority	events 1, dashboard 2, Kafka UI 3

For staging2, the committed example uses atlas-ingest2.twinfo.io, atlas-back2.twinfo.io, and atlas-kafka2.twinfo.io with listener priorities 11, 12, and 13.

Networking and environment shape

Variable	Purpose	Current shape
`vpc_cidr_block`	primary VPC CIDR	`10.0.0.0/16` in staging example, `10.20.0.0/16` in prod
`private_vpc_peering_routes`	extra routes on each private route table	empty in staging example, defined in prod
`vpc_flow_logs_retention_days`	retention for `/vpc/flow-logs`	1 day in staging example, 3 days in prod
`vpc_flow_logs_traffic_type`	traffic type captured by `/vpc/flow-logs`	`REJECT` in both committed roots
`elasticache_valkey_replication_group_id`	ElastiCache Valkey replication group identifier	`atlas-redis-dev` in committed staging values, `atlas-redis` in committed prod values
`elasticache_valkey_subnet_group_name`	ElastiCache Valkey subnet group name	`atlas-elasticache-private-dev` in committed staging values, `atlas-elasticache-private` in committed prod values
`elasticache_valkey_engine_version`	Valkey engine version	`8.2`
`elasticache_valkey_node_type`	ElastiCache node type	`cache.t3.medium`
`elasticache_valkey_number_of_replicas`	replicas used by the cluster-mode-disabled replication group	`1` in both committed roots
`elasticache_valkey_snapshot_retention_limit`	automatic snapshot retention in days	`1` in both committed roots
`elasticache_valkey_auto_minor_version_upgrade`	minor-version auto upgrade toggle	`false` in both committed roots
`elasticache_valkey_multi_az_enabled`	Multi-AZ toggle for the cache	`true` in both committed roots
`elasticache_valkey_transit_encryption_enabled`	in-transit encryption toggle	`false` in both committed roots
`elasticache_valkey_at_rest_encryption_enabled`	at-rest encryption toggle	`false` in both committed roots

staging2 does not create a second VPC or a second MSK cluster. It creates a second workload plane and appends 2 to the duplicated resource identifiers while attaching those resources to the shared staging VPC, shared subnets, and shared security groups.

Events and image bootstrap

Variable	Purpose	Current shape
`events_service_name_suffix`	events ECS service suffix	`events-ingestion`
`events_ecr_repository_suffix`	events ECR repository suffix	`events-ingestion`
`events_service_desired_count`	steady-state desired count for the events ECS service	`1` in both roots
`events_container_insights_setting`	Container Insights mode for the shared ECS cluster	`enabled` in both committed roots
`events_task_cpu`	task-level CPU for the events workload	`1024` in the committed roots
`events_task_memory`	task-level memory for the events workload	`2048` MiB in the committed roots
`events_app_container_cpu`	CPU reserved for the events app container	`768` in the committed roots
`events_app_container_memory_reservation`	memory reservation for the events app container	`1024` MiB in the committed roots
`events_enable_newrelic_sidecar`	enables the optional `newrelic-infra` sidecar on the events task	`true` in both committed roots
`events_newrelic_sidecar_image`	pinned sidecar image reference	`newrelic/nri-ecs:1.13.9`
`events_newrelic_sidecar_cpu`	CPU reserved for the New Relic sidecar	`256`
`events_newrelic_sidecar_memory_reservation`	memory reservation for the New Relic sidecar	`512` MiB
`dashboard_backend_desired_count`	steady-state desired count for the dashboard backend ECS service	`1` in both roots
`scoring_desired_count`	steady-state desired count for the scoring ECS service	`1` in both roots
`camunda_desired_count`	steady-state desired count for the Camunda ECS service	`1` in both roots
`kafka_ui_desired_count`	steady-state desired count for the Kafka UI ECS service	`1` in both roots
`events_log_retention_days`	events service log retention	1 day in staging example, 3 days in prod
`events_newrelic_log_retention_days`	events New Relic sidecar log retention	1 day in staging example, 7 days in prod
`dashboard_backend_log_retention_days`	dashboard service log retention	1 day in staging example, 3 days in prod
`scoring_log_retention_days`	scoring service log retention	1 day in staging example, 3 days in prod
`camunda_log_retention_days`	Camunda service log retention	1 day in staging example, 3 days in prod
`kafka_ui_log_retention_days`	Kafka UI log retention	1 day in staging example, 3 days in prod
`camunda_image`	upstream Camunda image reference	`camunda/camunda-bpm-platform:7.22.0`
`camunda_cpu_architecture`	Camunda task architecture override	`X86_64` in current committed values
`clickhouse_prometheus_agent_enabled`	enables the dedicated ECS Prometheus agent for ClickHouse Cloud metrics	`true` in staging and prod
`clickhouse_prometheus_agent_image`	Prometheus image used by the collector task	`prom/prometheus:v3.11.2`
`clickhouse_prometheus_agent_desired_count`	desired ECS task count for the collector	`0` in both committed roots until the collector should actively run
`clickhouse_prometheus_agent_cpu`	CPU units for the collector task	`256`
`clickhouse_prometheus_agent_memory`	memory in MiB for the collector task	`512`
`clickhouse_prometheus_agent_log_retention_days`	collector CloudWatch log retention	1 day in staging, 3 days in prod
`clickhouse_prometheus_agent_scrape_interval`	ClickHouse Cloud Prometheus scrape interval	`60s`
`events_newrelic_dashboard_enabled`	enables management of the reusable New Relic ECS service dashboard	`true` in prod; staging uses the variable default unless overridden
`events_newrelic_dashboard_name`	dashboard name used for the events ingestion service dashboard	environment-specific; staging defaults to `Atlas staging events dashboard`
`events_newrelic_dashboard_page_name`	page name used inside the events ingestion service dashboard	environment-specific; staging defaults to `Atlas staging dashboard`
`events_newrelic_apm_entity_guid`	APM entity GUID used by throughput/error widgets in the events service dashboard	prod is pinned in committed values; staging is supplied when desired
`ecs_service_newrelic_dashboards_enabled`	enables New Relic dashboards for dashboard backend, scoring, Camunda, and Kafka UI from AWS CloudWatch metrics	`true` in committed roots
`dashboard_backend_newrelic_apm_entity_guid`	optional APM entity GUID that adds backend throughput/error widgets to the dashboard backend ECS dashboard	empty by default
`rds_newrelic_dashboard_enabled`	enables the shared RDS New Relic dashboard for dashboard and Camunda databases	`true` in committed roots
`msk_newrelic_dashboard_enabled`	enables the MSK New Relic dashboard	`true` in committed roots

New Relic AWS account integration

Variable	Purpose	Current shape
`newrelic_account_id`	target New Relic account ID for the AWS linked account	`7848378` in committed roots
`newrelic_region`	New Relic account region	`US` in committed roots
`newrelic_api_key`	New Relic user API key used by the Terraform provider	supplied externally via `TF_VAR_newrelic_api_key`
`newrelic_aws_integration_enabled`	enables the AWS pull integration module in the target root	`true` in both committed roots
`newrelic_aws_linked_account_name`	display name for the linked AWS account inside New Relic	environment-specific, derived from the root prefix in committed values
`newrelic_aws_trusted_account_id`	AWS account ID used by New Relic to assume the integration role	`754728514883` in committed roots
`newrelic_aws_regions`	AWS regions covered by the pull integrations	`["us-east-1"]` in committed roots

MSK and sink controls

Variable	Purpose	Current shape
`msk_broker_instance_type`	Kafka broker class	`kafka.m7g.xlarge` in active staging values, `kafka.m5.large` in prod
`msk_ebs_volume_size_gib`	EBS storage attached to each broker	`500` GiB in active staging values, `20` GiB in prod
`msk_cloudwatch_enhanced_monitoring`	CloudWatch enhanced monitoring level for MSK	`DEFAULT` in examples; `PER_BROKER` in active staging/prod values
`msk_enable_multi_vpc_connectivity`	enables multi-VPC connectivity support	`false` in staging example, `true` in prod
`msk_enable_public_access`	enables public broker endpoints with service-provided EIPs	`true` in both committed roots
`msk_subnet_type`	public or private broker placement	`public` in both committed roots while public access is enabled
`msk_public_access_cidrs`	CIDRs allowed to reach public IAM + TLS on 9198	open in examples; restrict before apply when source ranges are known
`enable_msk_s3_sink`	enables optional MSK Connect to S3	enabled in the committed examples
`create_msk_connect_plugin_bucket`	creates the plugin bucket	enabled
`msk_s3_sink_plugin_file_key`	required ZIP object key when sink is on	set only after upload
`msk_s3_sink_topics_regex`	topic selector regex	`atlas\\.events\\..*`
`msk_s3_sink_partition_fields`	S3 partitioning fields	`organization_id`, `brand_id`
`msk_cleanup_topic_name_prefixes`	prefixes used by the cleanup Lambda to discover ephemeral topics	`["atlas.events."]` in both roots
`msk_cleanup_topics`	explicit topic definitions recreated by the cleanup Lambda after discovery-based cleanup	`atlas.events.raw` and `atlas.events.dlq` by default
`msk_disk_usage_critical_threshold`	critical threshold for native `KafkaDataLogsDiskUsed` alarms	`80` in the committed roots
`msk_cpu_user_high_threshold`	threshold for native `CpuUser` broker alarms	`60` in the committed roots
`msk_memory_available_threshold_gib`	low estimated available-memory threshold, converted to bytes against `MemoryFree + MemoryCached + MemoryBuffered`	`2` GiB in the committed roots
`msk_swap_used_threshold_bytes`	threshold for native `SwapUsed` broker alarms	`1` byte in the committed roots, alerting on any swap usage

Monitoring controls

Variable	Purpose	Current shape
`monitoring_slack_notifications_enabled`	creates the dedicated Slack SNS topic and Slack notifier Lambda	`true` in staging, `false` in prod
`monitoring_slack_webhook_url`	Slack incoming webhook URL supplied outside Git when Slack delivery is enabled	unset in the committed roots
`monitoring_slack_log_retention_days`	log retention for the dedicated Slack notifier Lambda	`7` days in the committed roots
`ecs_cpu_utilization_high_threshold`	threshold for ECS `CPUUtilization` alarms	`80` in the committed roots
`ecs_memory_utilization_high_threshold`	threshold for ECS `MemoryUtilization` alarms	`80` in the committed roots
`rds_cpu_utilization_high_threshold`	threshold for RDS `CPUUtilization` alarms	`80` in the committed roots
`rds_db_load_high_threshold`	threshold for RDS `DBLoad` alarms	`4` in the committed roots
`rds_free_storage_space_low_threshold_gib`	low free-storage threshold for RDS `FreeStorageSpace`, converted to bytes for the native metric alarm	`5` GiB in the committed roots

Scheduled environment operations

Variable	Purpose	Current shape
`environment_operations_enabled`	provisions the environment operation Lambdas and Scheduler resources	`true` in staging, `false` in prod
`environment_operations_schedules_enabled`	keeps the provisioned EventBridge schedules enabled instead of suspended	`false` in both roots
`environment_operations_timezone`	local timezone for the schedules	`America/Fortaleza`
`environment_shutdown_time`	daily local shutdown time	`20:00`
`environment_start_time`	weekday local startup time	`08:00`
`environment_cleanup_time`	daily local cleanup time for Kafka topic recreation	`20:30`
`environment_operations_log_retention_days`	log retention for the operation Lambdas	`7` days in staging example, `3` days in prod
`environment_operations_alarm_investigation_enabled`	test-only toggle for the SNS-driven DevOps Agent investigation Lambda inside `environment-operations`	test wiring only; not an implemented operational alerting path
`environment_operations_alarm_investigation_webhook_url`	DevOps Agent generic webhook URL passed to the test Lambda environment	test value in staging values, unset in prod
`environment_operations_alarm_investigation_webhook_secret`	DevOps Agent generic webhook HMAC secret passed to the test Lambda environment	test value in staging values, unset in prod

Dashboard database controls

Variable	Purpose	Current shape
`dashboard_db_identifier`	DB instance identifier suffix	`rds-atlas-dashboard` in staging example, `rds` in prod
`dashboard_db_name`	initial database name	`atlas_dashboard`
`dashboard_db_instance_class`	RDS instance size	`db.t4g.micro` in staging example, `db.t3.medium` in prod
`dashboard_db_subnet_group_type`	public or private subnet group	`public` in staging example, `private` in prod
`dashboard_db_publicly_accessible`	public accessibility flag	`true` in staging example, `false` in prod
`dashboard_db_multi_az`	Multi-AZ toggle	`false` in staging example, `true` in prod
`dashboard_db_allowed_cidr_blocks`	CIDRs allowed to reach PostgreSQL	open in staging example, constrained in prod
`dashboard_db_deletion_protection`	destroy protection	`false` in staging example, `true` in prod
`dashboard_db_skip_final_snapshot`	destroy-time snapshot behavior	`true` in staging example, `false` in prod

Camunda database controls

Variable	Purpose	Current shape
`camunda_db_identifier`	DB instance identifier suffix	`rds-atlas-camunda`
`camunda_db_name`	initial database name	`camunda`
`camunda_db_instance_class`	RDS instance size	`db.t4g.micro` in both roots today
`camunda_db_subnet_group_type`	public or private subnet group	`public` in staging example, `private` in prod
`camunda_db_publicly_accessible`	public accessibility flag	`true` in staging example, `false` in prod
`camunda_db_multi_az`	Multi-AZ toggle	`false` in staging example, `true` in prod
`camunda_db_allowed_cidr_blocks`	CIDRs allowed to reach PostgreSQL	empty by default; application path is the main access model
`camunda_db_deletion_protection`	destroy protection	`false` in staging example, `true` in prod
`camunda_db_skip_final_snapshot`	destroy-time snapshot behavior	`true` in staging example, `false` in prod

tip

When in doubt, compare terraform/staging/terraform.tfvars.example with terraform/prod/production.auto.tfvars. That gives the clearest view of how Atlas wants environment parity with different safety and access postures.

Identity and ownership​

Ingress and hostnames​

Networking and environment shape​

Events and image bootstrap​

New Relic AWS account integration​

MSK and sink controls​

Monitoring controls​

Scheduled environment operations​

Dashboard database controls​

Camunda database controls​