Skip to main content

Inputs and variables

The root inputs live in terraform/staging/variables.tf, terraform/staging2/variables.tf, and terraform/prod/variables.tf. All three roots share the same interface, but staging2 intentionally reuses the VPC, public/private subnets, security groups, and MSK cluster already created by staging.

Identity and ownership

VariablePurposeCurrent shape
aws_regiontarget AWS regionus-east-1 in both roots
project_namenaming and tagging prefixpoc-atlas in staging example, atlas in prod
environmentenvironment tag and naming suffixdev in staging example, prod in prod
owner_emailalert subscription target and ownership tagrequired in both roots

Ingress and hostnames

VariablePurposeCurrent shape
alb_certificate_idACM certificate for the ALB HTTPS listener140dbe42-48a0-467b-bbfd-1c0d69c5a492 in staging, fc539f7b-c3ab-4e90-882f-82392fc8b7a3 in prod
alb_ingress_cidrsclient allow-list for ALB HTTP/HTTPS0.0.0.0/0 in current staging example and prod values
alb_target_group_deregistration_delay_secondsconnection draining delay shared by all ALB-managed target groups10 seconds in staging, 300 seconds in prod
events_ingestion_hosthost routed to the events APIatlas-ingest.twinfo.io in staging, atlas-ingest.lifters.tech in prod
dashboard_backend_hosthost routed to the dashboard backendatlas-back.twinfo.io in staging, atlas-back.lifters.tech in prod
kafka_ui_hosthost routed to Kafka UIatlas-kafka.twinfo.io in staging, atlas-kafka.lifters.tech in prod
*_listener_priorityALB host-header rule priorityevents 1, dashboard 2, Kafka UI 3

For staging2, the committed example uses atlas-ingest2.twinfo.io, atlas-back2.twinfo.io, and atlas-kafka2.twinfo.io with listener priorities 11, 12, and 13.

Networking and environment shape

VariablePurposeCurrent shape
vpc_cidr_blockprimary VPC CIDR10.0.0.0/16 in staging example, 10.20.0.0/16 in prod
private_vpc_peering_routesextra routes on each private route tableempty in staging example, defined in prod
vpc_flow_logs_retention_daysretention for /vpc/flow-logs1 day in staging example, 3 days in prod
vpc_flow_logs_traffic_typetraffic type captured by /vpc/flow-logsREJECT in both committed roots
elasticache_valkey_replication_group_idElastiCache Valkey replication group identifieratlas-redis-dev in committed staging values, atlas-redis in committed prod values
elasticache_valkey_subnet_group_nameElastiCache Valkey subnet group nameatlas-elasticache-private-dev in committed staging values, atlas-elasticache-private in committed prod values
elasticache_valkey_engine_versionValkey engine version8.2
elasticache_valkey_node_typeElastiCache node typecache.t3.medium
elasticache_valkey_number_of_replicasreplicas used by the cluster-mode-disabled replication group1 in both committed roots
elasticache_valkey_snapshot_retention_limitautomatic snapshot retention in days1 in both committed roots
elasticache_valkey_auto_minor_version_upgrademinor-version auto upgrade togglefalse in both committed roots
elasticache_valkey_multi_az_enabledMulti-AZ toggle for the cachetrue in both committed roots
elasticache_valkey_transit_encryption_enabledin-transit encryption togglefalse in both committed roots
elasticache_valkey_at_rest_encryption_enabledat-rest encryption togglefalse in both committed roots

staging2 does not create a second VPC or a second MSK cluster. It creates a second workload plane and appends 2 to the duplicated resource identifiers while attaching those resources to the shared staging VPC, shared subnets, and shared security groups.

Events and image bootstrap

VariablePurposeCurrent shape
events_service_name_suffixevents ECS service suffixevents-ingestion
events_ecr_repository_suffixevents ECR repository suffixevents-ingestion
events_service_desired_countsteady-state desired count for the events ECS service1 in both roots
events_container_insights_settingContainer Insights mode for the shared ECS clusterenabled in both committed roots
events_task_cputask-level CPU for the events workload1024 in the committed roots
events_task_memorytask-level memory for the events workload2048 MiB in the committed roots
events_app_container_cpuCPU reserved for the events app container768 in the committed roots
events_app_container_memory_reservationmemory reservation for the events app container1024 MiB in the committed roots
events_enable_newrelic_sidecarenables the optional newrelic-infra sidecar on the events tasktrue in both committed roots
events_newrelic_sidecar_imagepinned sidecar image referencenewrelic/nri-ecs:1.13.9
events_newrelic_sidecar_cpuCPU reserved for the New Relic sidecar256
events_newrelic_sidecar_memory_reservationmemory reservation for the New Relic sidecar512 MiB
dashboard_backend_desired_countsteady-state desired count for the dashboard backend ECS service1 in both roots
scoring_desired_countsteady-state desired count for the scoring ECS service1 in both roots
camunda_desired_countsteady-state desired count for the Camunda ECS service1 in both roots
kafka_ui_desired_countsteady-state desired count for the Kafka UI ECS service1 in both roots
events_log_retention_daysevents service log retention1 day in staging example, 3 days in prod
events_newrelic_log_retention_daysevents New Relic sidecar log retention1 day in staging example, 7 days in prod
dashboard_backend_log_retention_daysdashboard service log retention1 day in staging example, 3 days in prod
scoring_log_retention_daysscoring service log retention1 day in staging example, 3 days in prod
camunda_log_retention_daysCamunda service log retention1 day in staging example, 3 days in prod
kafka_ui_log_retention_daysKafka UI log retention1 day in staging example, 3 days in prod
camunda_imageupstream Camunda image referencecamunda/camunda-bpm-platform:7.22.0
camunda_cpu_architectureCamunda task architecture overrideX86_64 in current committed values
clickhouse_prometheus_agent_enabledenables the dedicated ECS Prometheus agent for ClickHouse Cloud metricstrue in staging and prod
clickhouse_prometheus_agent_imagePrometheus image used by the collector taskprom/prometheus:v3.11.2
clickhouse_prometheus_agent_desired_countdesired ECS task count for the collector0 in both committed roots until the collector should actively run
clickhouse_prometheus_agent_cpuCPU units for the collector task256
clickhouse_prometheus_agent_memorymemory in MiB for the collector task512
clickhouse_prometheus_agent_log_retention_dayscollector CloudWatch log retention1 day in staging, 3 days in prod
clickhouse_prometheus_agent_scrape_intervalClickHouse Cloud Prometheus scrape interval60s
events_newrelic_dashboard_enabledenables management of the reusable New Relic ECS service dashboardtrue in prod; staging uses the variable default unless overridden
events_newrelic_dashboard_namedashboard name used for the events ingestion service dashboardenvironment-specific; staging defaults to Atlas staging events dashboard
events_newrelic_dashboard_page_namepage name used inside the events ingestion service dashboardenvironment-specific; staging defaults to Atlas staging dashboard
events_newrelic_apm_entity_guidAPM entity GUID used by throughput/error widgets in the events service dashboardprod is pinned in committed values; staging is supplied when desired
ecs_service_newrelic_dashboards_enabledenables New Relic dashboards for dashboard backend, scoring, Camunda, and Kafka UI from AWS CloudWatch metricstrue in committed roots
dashboard_backend_newrelic_apm_entity_guidoptional APM entity GUID that adds backend throughput/error widgets to the dashboard backend ECS dashboardempty by default
rds_newrelic_dashboard_enabledenables the shared RDS New Relic dashboard for dashboard and Camunda databasestrue in committed roots
msk_newrelic_dashboard_enabledenables the MSK New Relic dashboardtrue in committed roots

New Relic AWS account integration

VariablePurposeCurrent shape
newrelic_account_idtarget New Relic account ID for the AWS linked account7848378 in committed roots
newrelic_regionNew Relic account regionUS in committed roots
newrelic_api_keyNew Relic user API key used by the Terraform providersupplied externally via TF_VAR_newrelic_api_key
newrelic_aws_integration_enabledenables the AWS pull integration module in the target roottrue in both committed roots
newrelic_aws_linked_account_namedisplay name for the linked AWS account inside New Relicenvironment-specific, derived from the root prefix in committed values
newrelic_aws_trusted_account_idAWS account ID used by New Relic to assume the integration role754728514883 in committed roots
newrelic_aws_regionsAWS regions covered by the pull integrations["us-east-1"] in committed roots

MSK and sink controls

VariablePurposeCurrent shape
msk_broker_instance_typeKafka broker classkafka.m7g.xlarge in active staging values, kafka.m5.large in prod
msk_ebs_volume_size_gibEBS storage attached to each broker500 GiB in active staging values, 20 GiB in prod
msk_cloudwatch_enhanced_monitoringCloudWatch enhanced monitoring level for MSKDEFAULT in examples; PER_BROKER in active staging/prod values
msk_enable_multi_vpc_connectivityenables multi-VPC connectivity supportfalse in staging example, true in prod
msk_enable_public_accessenables public broker endpoints with service-provided EIPstrue in both committed roots
msk_subnet_typepublic or private broker placementpublic in both committed roots while public access is enabled
msk_public_access_cidrsCIDRs allowed to reach public IAM + TLS on 9198open in examples; restrict before apply when source ranges are known
enable_msk_s3_sinkenables optional MSK Connect to S3enabled in the committed examples
create_msk_connect_plugin_bucketcreates the plugin bucketenabled
msk_s3_sink_plugin_file_keyrequired ZIP object key when sink is onset only after upload
msk_s3_sink_topics_regextopic selector regexatlas\\.events\\..*
msk_s3_sink_partition_fieldsS3 partitioning fieldsorganization_id, brand_id
msk_cleanup_topic_name_prefixesprefixes used by the cleanup Lambda to discover ephemeral topics["atlas.events."] in both roots
msk_cleanup_topicsexplicit topic definitions recreated by the cleanup Lambda after discovery-based cleanupatlas.events.raw and atlas.events.dlq by default
msk_disk_usage_critical_thresholdcritical threshold for native KafkaDataLogsDiskUsed alarms80 in the committed roots
msk_cpu_user_high_thresholdthreshold for native CpuUser broker alarms60 in the committed roots
msk_memory_available_threshold_giblow estimated available-memory threshold, converted to bytes against MemoryFree + MemoryCached + MemoryBuffered2 GiB in the committed roots
msk_swap_used_threshold_bytesthreshold for native SwapUsed broker alarms1 byte in the committed roots, alerting on any swap usage

Monitoring controls

VariablePurposeCurrent shape
monitoring_slack_notifications_enabledcreates the dedicated Slack SNS topic and Slack notifier Lambdatrue in staging, false in prod
monitoring_slack_webhook_urlSlack incoming webhook URL supplied outside Git when Slack delivery is enabledunset in the committed roots
monitoring_slack_log_retention_dayslog retention for the dedicated Slack notifier Lambda7 days in the committed roots
ecs_cpu_utilization_high_thresholdthreshold for ECS CPUUtilization alarms80 in the committed roots
ecs_memory_utilization_high_thresholdthreshold for ECS MemoryUtilization alarms80 in the committed roots
rds_cpu_utilization_high_thresholdthreshold for RDS CPUUtilization alarms80 in the committed roots
rds_db_load_high_thresholdthreshold for RDS DBLoad alarms4 in the committed roots
rds_free_storage_space_low_threshold_giblow free-storage threshold for RDS FreeStorageSpace, converted to bytes for the native metric alarm5 GiB in the committed roots

Scheduled environment operations

VariablePurposeCurrent shape
environment_operations_enabledprovisions the environment operation Lambdas and Scheduler resourcestrue in staging, false in prod
environment_operations_schedules_enabledkeeps the provisioned EventBridge schedules enabled instead of suspendedfalse in both roots
environment_operations_timezonelocal timezone for the schedulesAmerica/Fortaleza
environment_shutdown_timedaily local shutdown time20:00
environment_start_timeweekday local startup time08:00
environment_cleanup_timedaily local cleanup time for Kafka topic recreation20:30
environment_operations_log_retention_dayslog retention for the operation Lambdas7 days in staging example, 3 days in prod
environment_operations_alarm_investigation_enabledtest-only toggle for the SNS-driven DevOps Agent investigation Lambda inside environment-operationstest wiring only; not an implemented operational alerting path
environment_operations_alarm_investigation_webhook_urlDevOps Agent generic webhook URL passed to the test Lambda environmenttest value in staging values, unset in prod
environment_operations_alarm_investigation_webhook_secretDevOps Agent generic webhook HMAC secret passed to the test Lambda environmenttest value in staging values, unset in prod

Dashboard database controls

VariablePurposeCurrent shape
dashboard_db_identifierDB instance identifier suffixrds-atlas-dashboard in staging example, rds in prod
dashboard_db_nameinitial database nameatlas_dashboard
dashboard_db_instance_classRDS instance sizedb.t4g.micro in staging example, db.t3.medium in prod
dashboard_db_subnet_group_typepublic or private subnet grouppublic in staging example, private in prod
dashboard_db_publicly_accessiblepublic accessibility flagtrue in staging example, false in prod
dashboard_db_multi_azMulti-AZ togglefalse in staging example, true in prod
dashboard_db_allowed_cidr_blocksCIDRs allowed to reach PostgreSQLopen in staging example, constrained in prod
dashboard_db_deletion_protectiondestroy protectionfalse in staging example, true in prod
dashboard_db_skip_final_snapshotdestroy-time snapshot behaviortrue in staging example, false in prod

Camunda database controls

VariablePurposeCurrent shape
camunda_db_identifierDB instance identifier suffixrds-atlas-camunda
camunda_db_nameinitial database namecamunda
camunda_db_instance_classRDS instance sizedb.t4g.micro in both roots today
camunda_db_subnet_group_typepublic or private subnet grouppublic in staging example, private in prod
camunda_db_publicly_accessiblepublic accessibility flagtrue in staging example, false in prod
camunda_db_multi_azMulti-AZ togglefalse in staging example, true in prod
camunda_db_allowed_cidr_blocksCIDRs allowed to reach PostgreSQLempty by default; application path is the main access model
camunda_db_deletion_protectiondestroy protectionfalse in staging example, true in prod
camunda_db_skip_final_snapshotdestroy-time snapshot behaviortrue in staging example, false in prod
tip

When in doubt, compare terraform/staging/terraform.tfvars.example with terraform/prod/production.auto.tfvars. That gives the clearest view of how Atlas wants environment parity with different safety and access postures.