Back to Cookbook

Pandas Pit Stop

Find and fix the slow, memory-hungry parts of your pandas code

Paste your pandas script and get a performance audit — which operations are slow, what's eating memory, where you're using apply() when vectorized ops would be much faster, and where you should switch to Polars or DuckDB.

CommunitySubmitted by CommunityWork1 min

PROMPT

Create a skill called "Pandas Pit Stop". When I paste Python code that uses pandas, analyze it for performance and memory issues. Specifically check for: (1) apply(), iterrows(), or itertuples() calls that could be vectorized — show the vectorized alternative. (2) Memory waste from suboptimal dtypes (object columns that should be category, int64 that could be int32, etc.) — show the optimized read_csv() call. (3) Operations that will blow up memory on large data (merge creating cartesian products, concat without cleanup). (4) SettingWithCopyWarning risks — show the .loc[] or .copy() fix. (5) Chained operations that are hard to debug — insert intermediate shape checks. (6) Cases where Polars or DuckDB would be significantly better — provide the translated code. Always estimate the speedup and memory savings for each suggestion.

How It Works

Pandas code that works fine on 10,000 rows chokes on 1 million. apply()

loops are secretly Python-speed. merge() duplicates memory. read_csv()

picks the worst dtypes. This skill profiles your code and tells you

exactly what to fix.

What You Get

  • Line-by-line performance annotations
  • apply()/iterrows()/itertuples() detection with vectorized alternatives
  • Memory profiling: estimated RAM usage per operation
  • dtype optimization (object → category, int64 → int32, etc.)
  • SettingWithCopyWarning detection and fixes
  • Suggestions for when to switch to Polars, DuckDB, or chunked processing
  • Refactored code with before/after benchmarks

Setup Steps

  1. Ask your Claw to create a "Pandas Pit Stop" skill with the prompt below
  2. Paste your pandas script or notebook cells
  3. Optionally mention the data size ("this runs on ~5 million rows")
  4. Get back an annotated version with specific fixes

Tips

  • Run this before scaling up from development data to production data
  • The dtype optimization alone can cut memory usage by 50–80%
  • The Polars/DuckDB suggestions include translated code, not just "use Polars"
  • Pairs well with the Notebook Cleanup skill for full notebook optimization
Tags:#python#pandas#performance#optimization