Right Sizer
Stop guessing CPU and memory limits — let actual usage data decide
Analyzes your pod resource usage over time and recommends properly sized requests and limits. Finds the pods running at 2% CPU with 4 cores requested, and the ones getting OOMKilled because limits are too low.
INGREDIENTS
PROMPT
Create a skill called "Right Sizer". Analyze Kubernetes pod resource usage and recommend proper sizing: 1. Run `kubectl top pods -n <namespace>` for current usage 2. Get current requests/limits from deployment specs 3. If Prometheus is available, query historical CPU and memory usage (p50, p95, p99) 4. Compare requested vs. actual for each pod/deployment For each deployment, recommend: - CPU request: p95 actual usage + 20% headroom - CPU limit: p99 actual usage + 50% headroom (or no limit for non-critical) - Memory request: p95 actual usage + 20% headroom - Memory limit: p99 actual usage + 30% headroom (memory is less elastic than CPU) Flag: - Over-provisioned: request > 2x actual p95 usage - Under-provisioned: actual p95 > 80% of limit - No limits set: resource contention risk Generate updated deployment YAMLs with the recommended values. Estimate the cost savings from right-sizing.
How It Works
Most Kubernetes pods are over-provisioned by 2-5x because nobody wants to
risk an OOMKill. This skill analyzes actual usage and recommends limits that
are safe but not wasteful.
What You Get
- Current vs. actual resource usage for all pods in a namespace
- Over-provisioned pods (wasting money) with recommended lower limits
- Under-provisioned pods (risk of OOMKill/throttling) with recommended higher limits
- Pods with no limits set (resource contention risk)
- Total cluster waste estimate in dollars
- Updated deployment YAMLs with recommended values
Setup Steps
- Ensure metrics-server is running in your cluster
- Tell your Claw which namespace(s) to analyze
- Review the recommendations
- Apply changes gradually (start with the most over-provisioned)
Tips
- Metrics-server gives real-time snapshots; Prometheus gives historical data (better for sizing)
- Always set requests (scheduling guarantee) even if you're flexible on limits
- Watch for burstable workloads — they need higher limits than average usage suggests
- Apply changes during low-traffic periods and monitor for a few days
- Consider VPA (Vertical Pod Autoscaler) for automatic ongoing adjustment