Find noisy alerts and propose threshold changes backed by real paging data
Alert fatigue is one of the fastest ways to burn out an on-call rotation. The alerts waking people up the most are often old monitors nobody has revisited. This recipe audits Datadog monitor history and PagerDuty incidents, ranks noisy alerts by fire rate and self-resolution rate, and proposes threshold changes with the evidence behind each recommendation.
Run an alert tuning audit for a DevOps or SRE engineer. Goal: Help me find the alerts creating the most noise on my team's rotation and propose data-backed threshold changes I can safely review and ship. Ask me for: - Lookback window in days (default 90) - Service or team scope, if I want to limit the audit - My team's noise tolerance: how many fires per week is acceptable per alert - Whether to open a GitHub PR or just produce a report - The repo where monitor config lives, if I want a PR Use available integrations this way: - Datadog: list monitors, query firing history, and pull metric values during fires - PagerDuty: cross-reference which alerts paged a human and which auto-resolved - GitHub: locate monitor config files and prepare a PR with proposed threshold changes - Linear: create tickets for alerts that need owner action beyond a threshold change - Slack: post a summary of findings to the team channel - Google Docs: write the audit report Output: 1. Top 10 noisiest alerts ranked by fire rate and self-resolution rate 2. Alerts that never fire in the lookback window, marked for review 3. Per-alert proposed threshold with the data behind it (percentile, baseline, suggested value) 4. A GitHub PR with the monitor config changes 5. Linear tickets for any alerts that need redesign, not just tuning 6. A Slack summary for the team channel 7. The full audit report in Google Docs Rules: - Never auto-merge the PR; human review is required - Do not propose deleting an alert without a documented reason - Show the data behind every threshold proposal; no opaque recommendations - If an alert is tied to an SLO, flag it; SLO alerts get reviewed differently - Distinguish flapping alerts from genuinely noisy ones; they need different fixes
This recipe runs a noise audit across your alerting stack. It pulls
firing history from Datadog and PagerDuty, identifies monitors that
fire often or self-resolve quickly, and proposes threshold changes
based on observed signal rather than guesswork.
Hand off the pager without making the next engineer reconstruct the shift
On-call handoffs usually happen fast, right when context is easiest to lose. The next engineer starts their shift digging through incidents, deploys, noisy alerts, and half-finished Slack threads just to understand the current state. This recipe pulls the shift's PagerDuty incidents, deploy activity, Datadog alerts, and open Slack threads into a clean handoff brief the next on-call can use immediately.
Find stale runbooks before they fail during an incident
Runbooks go stale quietly. Deploy paths change, services move, dependencies get renamed, and the doc only gets noticed at 3 a.m. when it points to infrastructure that no longer exists. This recipe finds runbooks that have not kept up with the services they describe and proposes updates based on recent service changes.
Stop audio drift by quarantining variable-frame-rate clips at ingest
Audio slowly drifts out of sync or randomly desyncs in your timeline when footage is variable frame rate — common with iPhone footage, screen recordings, and some OBS workflows. This recipe catches VFR clips at ingest, transcodes them to constant frame rate, and quarantines the originals so drift never reaches your edit.
Local-first AI assistant that automates small daily tasks safely on your device
A personal, local-first AI assistant that automates small daily tasks—organizing files, setting reminders, and monitoring system events—without touching sensitive data or taking risky actions without your approval.