Data Janitor
Messy CSV in, clean data out
Messy CSV, TSV, JSON, or spreadsheet export in; cleaned, normalized data out. A strong starter recipe because the before/after result is obvious and easy to verify.
PROMPT
Clean this data file for me. Auto-detect the encoding, delimiter, and quoting style. Then: (1) standardize all dates to ISO 8601 format, (2) normalize null values ("NULL", "N/A", "None", empty strings, "nan") to a consistent representation, (3) detect and flag type inconsistencies (e.g., a column that's mostly numbers but has some text), (4) remove duplicate rows, (5) report any rows that look anomalous. Return the cleaned data and a summary of what was fixed. Optionally generate a reusable Python script so I can run this again on future files. Data file: [paste or attach your data file]
How It Works
Your Claw inspects the file, auto-detects what's wrong (encoding, delimiter,
quoting, date formats, mixed types), and generates a cleaning script or
returns the cleaned data directly. Handles the stuff that makes data
engineers cry.
What You Get
- Auto-detection of encoding (UTF-8, Latin-1, Windows-1252, etc.)
- Delimiter inference (comma, semicolon, tab, pipe)
- Date format standardization to ISO 8601
- Type coercion: "NULL", "N/A", "None", "" all normalized
- Schema validation and anomaly detection
- Cleaned output in your preferred format (CSV, JSON, Parquet)
Setup Steps
- Share your messy data file
- Optionally describe what the data should look like
- Get cleaned data and/or a reusable cleaning script
Tips
- Works with CSV, TSV, JSON, JSONL, and Excel files
- Ask for a Python/pandas script if you want a reusable pipeline
- Handles Excel-mangled data (dates that became numbers, leading zeros stripped)
- Can merge/join multiple messy files with different schemas
- For recurring data sources, save the cleaning script and run it automatically