Back to Cookbook
OpenClaw recipe

GitHub Actions and CircleCI Flaky Test Detection and Triage

aka “CI/CD Flaky Test Hunter

Find your flakiest tests and turn them into tickets, not retries

Flaky tests rot CI from the inside. Engineers re-run jobs, miss real failures, and stop trusting the pipeline until the only signal is "did the third try pass?" This recipe scans recent GitHub Actions and CircleCI runs, ranks tests by flake rate, identifies suspect commits, and creates Linear tickets with reproduction context.

House RecipeWork10 min
Try in KiloClawFree 7-day trial

PROMPT

Find and triage flaky tests across CI/CD pipelines for a DevOps engineer. Goal: Help me find the tests causing the most CI noise and turn them into tickets with enough context that they can be fixed, not just retried. Ask me for: - Lookback window in days (default 30) - CI providers in use (GitHub Actions, CircleCI, both) - Repos to scope the audit to - Flake threshold (e.g. failed at least 3 times while the same commit eventually passed) - Whether to create Linear tickets for top offenders Use available integrations this way: - GitHub: pull GitHub Actions run history, job logs, and test reports - CircleCI: pull pipeline history and test failure metadata - Linear: create tickets for the worst offenders with reproduction notes - Slack: post a summary to the team channel - Google Docs: write the full report Output: 1. Top 20 flakiest tests ranked by flake rate, frequency, and time-cost 2. For each test: failure history, suspect commits, last seen passing 3. Tests that flaked exactly once (likely real bugs, not flakes) 4. Linear tickets for the worst offenders with logs linked 5. A Slack summary for the team channel 6. The full report in Google Docs Rules: - Do not quarantine or skip tests directly; produce recommendations only - Show the failure logs for top offenders, not just the test name - Distinguish flaky tests from tests that fail on a specific environment - Do not propose deleting a test without a documented reason - If a test belongs to another team's repo, route the ticket to them

How It Works

This recipe quantifies CI flake instead of leaving it to vibes. It

pulls recent CI runs, identifies tests that fail and later pass on

the same commit, ranks them by flake rate and developer time lost,

and turns the worst offenders into tickets with enough context to fix.

What You Get

  • Top 20 flakiest tests ranked by flake rate and frequency
  • Failure history, recent passing rate, and suspect commits per test
  • Tests that flaked once and may be real bugs
  • Linear tickets for the worst offenders with reproduction context
  • Slack post for the team channel

Setup Steps

  1. Ask OpenClaw to run the "CI/CD Flaky Test Hunter" recipe using the prompt below
  2. Connect GitHub, CircleCI, Linear, and Slack
  3. Set the lookback window (30 days is a good default)
  4. Triage the ticket list with test owners
  5. Set a flake budget so the team knows when CI needs dedicated cleanup time

Tips

  • A test that flakes once is suspicious. One that flakes ten times is broken.
  • Quarantining flaky tests can be useful, but track the count or they accumulate.
  • Sometimes the test is fine and the system under test is racy. Read before quarantining.
  • Run this monthly. Flake debt compounds when nobody measures it.
Tags:#devops#ci_cd#testing#flaky_tests#developer_experience#quality

Related Recipes

On-Call Shift Handoff Brief

Hand off the pager without making the next engineer reconstruct the shift

On-call handoffs usually happen fast, right when context is easiest to lose. The next engineer starts their shift digging through incidents, deploys, noisy alerts, and half-finished Slack threads just to understand the current state. This recipe pulls the shift's PagerDuty incidents, deploy activity, Datadog alerts, and open Slack threads into a clean handoff brief the next on-call can use immediately.

Work5 min

Postmortem Polisher

Turn rough incident notes into a blameless postmortem with shippable follow-ups

Postmortems often start as rushed incident notes and stay messy until review time. This recipe takes a rough draft, removes blameful language, fills timeline gaps from PagerDuty and Slack, turns vague follow-ups into concrete Linear tickets, and schedules the review.

Work10 min

Creative Refresh Sprint Planner

Ship new creatives every week without chaos

Creative teams burn out when "we need new ads" arrives as an emergency. This recipe creates a weekly sprint system: inputs, brief template, production checklist, QA, and a testing plan that compounds learning.

Creative10 min setup

CLAWBITE AI

Local-first AI assistant that automates small daily tasks safely on your device

A personal, local-first AI assistant that automates small daily tasks—organizing files, setting reminders, and monitoring system events—without touching sensitive data or taking risky actions without your approval.

Personal5 min