Distributed Trace Hunt
Find missing spans and broken propagation in microservice flows
Debug microservices by enforcing end-to-end correlation (trace IDs, request IDs) and systematically locating propagation breaks or sampling gaps.
INGREDIENTS
PROMPT
Create a skill called "Distributed Trace Hunt". Ask me for: - The request journey (start → end) and involved services - Current tracing stack (OpenTelemetry/Jaeger/etc.) and sampling approach - One example trace/span screenshot or IDs (if available) Output: - A step-by-step trace propagation audit checklist - Likely root-cause buckets and how to confirm each - A minimal staging test plan that asserts end-to-end spans exist
How It Works
In distributed systems, partial traces are common. This recipe builds a repeatable
trace-driven debugging path: define the journey → validate propagation → close gaps.
Triggers
- Traces contain only some services (missing spans)
- You can't follow a single request end-to-end
- Production incidents require correlating logs/metrics/traces quickly
Steps
- Choose one representative request journey and record its expected service chain.
- Verify propagation headers and context injection/extraction per hop.
- Check sampling policy and exporter/backpressure settings.
- Add a "trace assertion" in a staging smoke test: request should produce spans in all hops.
- Create a playbook for "missing spans" triage: sampling vs instrumentation vs async boundaries.
Expected Outcome
- You can trace a request through the intended service chain reliably.
- Incidents become faster to debug and less dependent on tribal knowledge.
Example Inputs
- "Our traces show the API gateway and service A, but nothing after."
- "Async job continuation loses the trace context."
- "We need to correlate frontend RUM with backend traces."
Tips
- Treat broken propagation as a correctness bug, not a logging preference.