DevOps Agent investigation test and broader alarm coverage
ยท 2 min read
Atlas Infra tested an AWS DevOps Agent investigation path from the existing staging alerts SNS topic, and the CloudWatch alarm set became more explicit about which services and databases are under pressure.
- the
environment-operationsalarm_investigationLambda was used as a proof-of-concept against the shared alerts SNS topic - the test Lambda only reacted when a CloudWatch alarm notification entered
ALARM - the test webhook request was signed with HMAC and included the alarm name, environment, region, account, description, metrics, and a CloudWatch console link
- ECS task-count alarms now cover
events-ingestion,dashboard-backend,scoring, andcamunda - Kafka UI stays out of the task-count alarm set
- both RDS instances now have
CPUUtilizationandDBLoadalarms