Before shipping to prod, get a one-page go/no-go with the receipts
Pre-deploy checks are easy to skip when the change feels small or the team is under pressure. This recipe runs a fast readiness check across GitHub, Datadog, Sentry, and PagerDuty: open incidents, error spikes, missing approvals, recent failed deploys, and change freezes. The output is a defensible go/no-go report for the deploy channel.
Run a pre-deploy readiness check and produce a go/no-go recommendation. Goal: Help me make a confident go or no-go decision before deploying to production, with the data to defend the call either way. Ask me for: - The service or repo being deployed - The PR or commit being deployed - Target environment (production, staging) - Deploy window (now, scheduled time) - Whether to run a follow-up health check 30 minutes post-deploy - Slack channel for the readiness report Use available integrations this way: - GitHub: list open PRs touching the same files, check CI status, identify the diff scope - Datadog: snapshot of current service health (latency, error rate, saturation) - Sentry: recent error spikes and new issues in the service - PagerDuty: active incidents, recent incidents in the last 24 hours, change freeze status - Slack: post the readiness report - Linear: create a ticket if a no-go is recommended and the deploy is blocked Output: 1. One-page readiness report with sections for each check 2. Health snapshot table (latency, error rate, saturation, error budget) 3. Active and recent incidents on the affected service 4. Diff scope summary: files touched, services affected, blast radius estimate 5. Go or no-go recommendation with reasoning and any caveats 6. A Slack post for the deploy channel 7. Optional follow-up check 30 minutes after the deploy completes Rules: - Never block or unblock a deploy directly; produce a recommendation only - Do not approve a deploy if there is an active SEV1 or SEV2 incident on the service - Always show the data behind a no-go recommendation - If error budget is already burned, flag it but defer the call to the human - Treat staging deploys with lighter checks; the friction should match the risk
This recipe adds a lightweight reliability gate before deploys. It
checks the obvious blockers: active incidents, elevated errors,
failed recent deploys, missing approvals, and change freezes. Then
it produces a decision report you can defend to a reviewer or to
yourself during a rollback review.
Find noisy alerts and propose threshold changes backed by real paging data
Alert fatigue is one of the fastest ways to burn out an on-call rotation. The alerts waking people up the most are often old monitors nobody has revisited. This recipe audits Datadog monitor history and PagerDuty incidents, ranks noisy alerts by fire rate and self-resolution rate, and proposes threshold changes with the evidence behind each recommendation.
Find stale runbooks before they fail during an incident
Runbooks go stale quietly. Deploy paths change, services move, dependencies get renamed, and the doc only gets noticed at 3 a.m. when it points to infrastructure that no longer exists. This recipe finds runbooks that have not kept up with the services they describe and proposes updates based on recent service changes.
Stop audio drift by quarantining variable-frame-rate clips at ingest
Audio slowly drifts out of sync or randomly desyncs in your timeline when footage is variable frame rate — common with iPhone footage, screen recordings, and some OBS workflows. This recipe catches VFR clips at ingest, transcodes them to constant frame rate, and quarantines the originals so drift never reaches your edit.
Catch the "stuck at 99%" class of export failures before they happen
Exports hang or fail late in the process — often near completion — due to insufficient free space, problematic clips, or unstable settings. This recipe checks disk space, validates export targets, and provides a fallback render path before you waste an hour waiting for a doomed export.