Back to Cookbook

Data Janitor

Messy CSV in, clean data out

Messy CSV, TSV, JSON, or spreadsheet export in; cleaned, normalized data out. A strong starter recipe because the before/after result is obvious and easy to verify.

House RecipeWork1 min

PROMPT

Clean this data file for me. Auto-detect the encoding, delimiter, and quoting style. Then: (1) standardize all dates to ISO 8601 format, (2) normalize null values ("NULL", "N/A", "None", empty strings, "nan") to a consistent representation, (3) detect and flag type inconsistencies (e.g., a column that's mostly numbers but has some text), (4) remove duplicate rows, (5) report any rows that look anomalous. Return the cleaned data and a summary of what was fixed. Optionally generate a reusable Python script so I can run this again on future files. Data file: [paste or attach your data file]

How It Works

Your Claw inspects the file, auto-detects what's wrong (encoding, delimiter,

quoting, date formats, mixed types), and generates a cleaning script or

returns the cleaned data directly. Handles the stuff that makes data

engineers cry.

What You Get

  • Auto-detection of encoding (UTF-8, Latin-1, Windows-1252, etc.)
  • Delimiter inference (comma, semicolon, tab, pipe)
  • Date format standardization to ISO 8601
  • Type coercion: "NULL", "N/A", "None", "" all normalized
  • Schema validation and anomaly detection
  • Cleaned output in your preferred format (CSV, JSON, Parquet)

Setup Steps

  1. Share your messy data file
  2. Optionally describe what the data should look like
  3. Get cleaned data and/or a reusable cleaning script

Tips

  • Works with CSV, TSV, JSON, JSONL, and Excel files
  • Ask for a Python/pandas script if you want a reusable pipeline
  • Handles Excel-mangled data (dates that became numbers, leading zeros stripped)
  • Can merge/join multiple messy files with different schemas
  • For recurring data sources, save the cleaning script and run it automatically
Tags:#data#csv#cleaning#automation#etl