Back to Cookbook
KiloClaw

Flaky Root Cause Hunter

Find the real source of nondeterminism, not just the symptom

Diagnose flaky tests by categorizing nondeterminism (timing, order-dependence, shared state, network, resource contention) and applying targeted fixes.

CommunitySubmitted by CommunityWork12 min

INGREDIENTS

🐙GitHub

PROMPT

Create a skill called "Flaky Root Cause Hunter". Given: - A flaky test name/file and recent failure logs - The test type (unit/integration/e2e) and environment (CI/local) Output: - A likely root-cause category and how to confirm it - Concrete fix patterns for that category - A verification plan (stress reruns, seed capture, isolation)

How It Works

This recipe structures flake debugging so teams stop inflating timeouts and start removing

nondeterminism from tests and environments.

Triggers

  • Quarantined flaky tests accumulate
  • Retries/timeouts are used as the primary fix
  • Failures are hard to reproduce locally

Steps

  1. Re-run the single test repeatedly and record failure rate + modes.
  2. Categorize:
  • race/timing,
  • order dependence,
  • shared global state,
  • network/IO instability,
  • resource contention.
  1. Apply fixes:
  • hermetic test data,
  • explicit waits and deterministic clocks,
  • isolate shared state,
  • remove external network dependencies or mock safely.
  1. Add instrumentation to tests (timestamps, retries count, random seeds).
  2. Confirm fix by stress reruns and remove quarantine tag.

Expected Outcome

  • Reduced flaky rate and restored trust in CI.
  • Less time spent rerunning, more time shipping.

Example Inputs

  • "This test fails 1/20 runs only on CI."
  • "E2E flake occurs when CI is slow."
  • "Order-dependent failures in a shared DB test suite."

Tips

  • If you can't explain why the test failed, you haven't fixed the flake yet.
Tags:#flaky-tests#testing#debugging#ci-cd