Pandas Pit Stop
Find and fix the slow, memory-hungry parts of your pandas code
Paste your pandas script and get a performance audit — which operations are slow, what's eating memory, where you're using apply() when vectorized ops would be much faster, and where you should switch to Polars or DuckDB.
PROMPT
Create a skill called "Pandas Pit Stop". When I paste Python code that uses pandas, analyze it for performance and memory issues. Specifically check for: (1) apply(), iterrows(), or itertuples() calls that could be vectorized — show the vectorized alternative. (2) Memory waste from suboptimal dtypes (object columns that should be category, int64 that could be int32, etc.) — show the optimized read_csv() call. (3) Operations that will blow up memory on large data (merge creating cartesian products, concat without cleanup). (4) SettingWithCopyWarning risks — show the .loc[] or .copy() fix. (5) Chained operations that are hard to debug — insert intermediate shape checks. (6) Cases where Polars or DuckDB would be significantly better — provide the translated code. Always estimate the speedup and memory savings for each suggestion.
How It Works
Pandas code that works fine on 10,000 rows chokes on 1 million. apply()
loops are secretly Python-speed. merge() duplicates memory. read_csv()
picks the worst dtypes. This skill profiles your code and tells you
exactly what to fix.
What You Get
- Line-by-line performance annotations
- apply()/iterrows()/itertuples() detection with vectorized alternatives
- Memory profiling: estimated RAM usage per operation
- dtype optimization (object → category, int64 → int32, etc.)
- SettingWithCopyWarning detection and fixes
- Suggestions for when to switch to Polars, DuckDB, or chunked processing
- Refactored code with before/after benchmarks
Setup Steps
- Ask your Claw to create a "Pandas Pit Stop" skill with the prompt below
- Paste your pandas script or notebook cells
- Optionally mention the data size ("this runs on ~5 million rows")
- Get back an annotated version with specific fixes
Tips
- Run this before scaling up from development data to production data
- The dtype optimization alone can cut memory usage by 50–80%
- The Polars/DuckDB suggestions include translated code, not just "use Polars"
- Pairs well with the Notebook Cleanup skill for full notebook optimization