Skip to main content

4 posts tagged with "ecs"

View All Tags

DevOps Agent investigation test and broader alarm coverage

· 2 min read
Atlas Infra

Atlas Infra tested an AWS DevOps Agent investigation path from the existing staging alerts SNS topic, and the CloudWatch alarm set became more explicit about which services and databases are under pressure.

  • the environment-operations alarm_investigation Lambda was used as a proof-of-concept against the shared alerts SNS topic
  • the test Lambda only reacted when a CloudWatch alarm notification entered ALARM
  • the test webhook request was signed with HMAC and included the alarm name, environment, region, account, description, metrics, and a CloudWatch console link
  • ECS task-count alarms now cover events-ingestion, dashboard-backend, scoring, and camunda
  • Kafka UI stays out of the task-count alarm set
  • both RDS instances now have CPUUtilization and DBLoad alarms

Scoring and Camunda services added to the platform

· One min read
Atlas Infra

Atlas Infra now includes the new scoring workload and its Camunda runtime as part of the shared ECS platform.

  • atlas-scoring runs as a dedicated ECS/Fargate service with its own ECR repository and runtime secret
  • scoring consumes atlas.l2.transaction.deposit and publishes atlas.l3.user.score through the existing MSK cluster
  • Camunda runs as a separate ECS/Fargate service with its own PostgreSQL RDS instance and runtime secret
  • internal callers reach scoring through http://scoring:8083 over ECS Service Connect
  • scoring reaches Camunda internally through ECS Service Connect instead of depending on a public hostname