Skip to main content

One post tagged with "monitoring"

View All Tags

DevOps Agent investigation test and broader alarm coverage

ยท 2 min read
Atlas Infra

Atlas Infra tested an AWS DevOps Agent investigation path from the existing staging alerts SNS topic, and the CloudWatch alarm set became more explicit about which services and databases are under pressure.

  • the environment-operations alarm_investigation Lambda was used as a proof-of-concept against the shared alerts SNS topic
  • the test Lambda only reacted when a CloudWatch alarm notification entered ALARM
  • the test webhook request was signed with HMAC and included the alarm name, environment, region, account, description, metrics, and a CloudWatch console link
  • ECS task-count alarms now cover events-ingestion, dashboard-backend, scoring, and camunda
  • Kafka UI stays out of the task-count alarm set
  • both RDS instances now have CPUUtilization and DBLoad alarms