alerts
Here’s how I did probably a week or so of work in ~6 hours. It started with an idea for how to compose newrelic and pagerduty (via terraform). The core concept is one module creates the pagerduty service for the, well, service, and another module creates the newrelic alert policy which is just an organizational tool for alerts. Once those are created we combine them by passing their outputs into a module for creating the bridge between newrelic and pagerduty.
The key part of this structure is that the alerting modules now become focused on one thing, generating alerts for a certain semantic space. In my case, I decided to create alert modules based on aws service (kinesis, sns, sqs, lambda, dynamodb). Thus, if a role / project in terraform contains kinesis, firehose, lambda, and dynamodb, we simply initialize the appropriate alerts modules and pass them into the bridge module.
Setting up the above paradigm doesn’t take a week, but implementing the module for each aws service can take a significant amount of time due to having to research what alerts to setup. Well, I took everyone’s favorite shorcut and invoked my good friend Claude to generate the modules for the different services. The flow was such: Define the input/output interfaces, Work on getting a single one built well, Feed the initial module back in and ask to generate more, For each module, review the alert queries (our friend occasionally makes things up) and editorialize as necessary.
Six hours later I have alerts for: ecs-service, ecs-cluster, kinesis, sns, sqs, ec2, route53, lambda, lambda-account, dynamodb, firehose, eventbus, eventbus rule, vpc, and a few others.
From there, it’s a matter of just applying them to the appropriate projects.