Back to Cookbook

Spreadsheet Rescue

Turn messy spreadsheets into clean, analyzable data

Handles the spreadsheets from hell — merged cells, multiple header rows, color-coded data, footnotes mixed with values, data starting at cell B7. Extracts clean tabular data from Excel files without forcing you to reverse-engineer the formatting first.

CommunitySubmitted by CommunityWork1 min

PROMPT

Create a skill called "Spreadsheet Rescue". When I give you an Excel file, don't just try to read it blindly. First, inspect every sheet: find the actual data region (it might not start at A1), detect merged cells, identify multi-row headers, and spot total/subtotal rows mixed with data. Then clean it: unmerge cells by propagating values where appropriate, flatten multi-row headers into single-row column names, remove summary rows, and if formatting is being used as data (for example, colored rows mean "overdue"), extract that meaning into an explicit column whenever possible. Output a clean CSV for each data region found, plus a data dictionary describing each column. If the file contains formulas, document the business logic they appear to encode.

How It Works

Business users build spreadsheets for humans to read, not machines to parse.

You get merged cells everywhere, header rows spanning three lines, totals

mixed in with data rows, and color used as a signal instead of a column.

This skill detects the real data region, normalizes the sheet structure,

and gives you clean tabular output you can actually analyze.

What You Get

  • Auto-detection of data regions across multiple sheets
  • Merged cell resolution (fills down/right as appropriate)
  • Multi-row header detection and flattening
  • Removal of total/subtotal rows
  • Extraction of color-coded meaning into explicit columns when the workbook makes that possible
  • Clean CSV or Parquet output per sheet
  • A data dictionary generated from the extracted structure

Setup Steps

  1. Ask your Claw to create a "Spreadsheet Rescue" skill with the prompt below
  2. Drop any messy Excel file into the conversation
  3. Get back one clean CSV per data region, plus a summary of what was found
  4. Optionally get a pandas read script for recurring files from the same source

Tips

  • Especially useful for finance and operations spreadsheets with heavy formatting
  • For recurring files, save the extraction config so future files process faster
  • Works on .xlsx, .xls, and .xlsm files
  • Pairs well with CSV Doctor for downstream validation
Tags:#data-cleaning#excel#spreadsheet#automation