At Cloud9 we use Datadog for all our server monitoring and Pagerduty for alerts when things break. To do this we use the standard Datadog + Pagerduty integration and make Pagerduty automatically trigger for critical incidents by adding @pagerduty
into the “Say what’s happening” field in the Datadog monitor.
Unfortunately datadog triggers the monitor both when it starts and when it has recovered. Because we had @pagerduty
in the “Say what’s happening” area this meant we got a pagerduty call both times.
You can fix this by wrapping the @pagerduty
trigger with {{#is_alert}}{{/is_alert}}
. So your monitor should look something like:
Docker is having trouble creating containers. Please investigate @slack-datadog @slack-warnings {{#is_alert}}@pagerduty{{/is_alert}}
You can also use {{#is_warning}}@pagerduty{{/is_warning}}
for warnings (where the monitor has gone over the warning threshold but not the alert threshold).
Then you can go back to bed safe in the knowledge your server isn’t going to wake you up to tell you “Everything is good, nothing is broken”.