CSV Doctor
Fix broken CSVs before they break your pipeline
Drop any CSV on your Claw and get a clean, validated file back. It detects encoding issues, mixed delimiters, malformed rows, broken quoting, BOM markers, and type mismatches — then fixes everything and tells you what it changed.
PROMPT
Create a skill called "CSV Doctor". When I give you a CSV file, analyze it thoroughly before reading it: detect the encoding (try UTF-8, Latin-1, Windows-1252, ASCII), identify the delimiter (comma, semicolon, tab, pipe), check for BOM markers, and scan for malformed rows (inconsistent column counts, unescaped quotes, embedded newlines). Fix all issues automatically: convert to UTF-8, normalize the delimiter to comma, fix quoting, quarantine unfixable rows into a separate file. Output the clean CSV and a change log listing every fix applied with row numbers. If the file has type issues (ZIP codes losing leading zeros, dates in mixed formats), flag those too.
How It Works
CSV files from vendors, partners, and internal teams arrive in every possible
state of disrepair — wrong encoding, semicolons instead of commas, unescaped
quotes, trailing whitespace, mixed line endings. Instead of debugging
`pd.read_csv()` errors for the hundredth time, toss the file at your Claw
and get a clean version back with a change log.
What You Get
- Auto-detected encoding (UTF-8, Latin-1, Windows-1252, etc.) with conversion to UTF-8
- Delimiter detection (comma, semicolon, tab, pipe)
- Malformed row quarantine with explanations
- BOM marker removal
- Consistent quoting and escaping
- A change log showing every fix applied
Setup Steps
- Ask your Claw to create a "CSV Doctor" skill using the prompt below
- Drop any problematic CSV into the conversation
- Get back a clean file plus a summary of what was wrong
- Optionally output as Parquet for downstream type safety
Tips
- Works great as a preprocessing step before any pandas or SQL ingestion
- The change log is useful for documenting data quality issues back to the source team
- For recurring files from the same source, save the detected rules so future files are faster to clean
- Handles files up to several hundred MB by processing in chunks