Back to Cookbook

First Look

Know what you're dealing with before writing a single query

Drop any dataset — CSV, Excel, Parquet, database table — and get a complete profile in seconds: types, nulls, distributions, outliers, duplicates, correlations, and quality issues. The EDA you always do manually, automated.

CommunitySubmitted by CommunityWork1 min

PROMPT

Create a skill called "First Look". When I give you a dataset (CSV, Excel, Parquet, or a database table name), produce a comprehensive data profile: (1) Overview: row count, column count, memory size, duplicate row count. (2) Per-column analysis: inferred type, null count and percentage, unique values count, min/max/mean/median/std for numeric, top 10 values for categorical, example values. (3) Distribution analysis: identify skewed distributions, uniform distributions, and constant columns. (4) Outlier detection: flag values beyond 3 standard deviations or 1.5*IQR. (5) Cross-column correlations for numeric pairs above 0.7. (6) Data quality summary: an overall quality score and per-column scores based on completeness, uniqueness, and consistency. (7) Recommended next steps: specific cleaning actions for each issue found. For large datasets (>1M rows), profile a stratified sample and note the sampling approach.

How It Works

Every new dataset starts the same way: df.head(), df.describe(), df.info(),

check for nulls, check for duplicates, look at value distributions...

This skill does all of it in one pass and gives you a structured report.

What You Get

  • Column-level profiling: type, null rate, unique count, min/max/mean/median
  • Distribution summaries and histogram sketches for numeric columns
  • Top values and frequency counts for categorical columns
  • Duplicate row detection (exact and fuzzy)
  • Outlier flagging (values beyond 3 standard deviations or IQR-based)
  • Cross-column correlation matrix for numeric columns
  • Data quality score per column and overall
  • Suggested cleaning steps based on issues found

Setup Steps

  1. Ask your Claw to create a "First Look" skill with the prompt below
  2. Drop any data file or point it at a database table
  3. Get back a comprehensive profile report in under a minute

Tips

  • Run this on every new data source before doing any analysis
  • The quality score helps prioritize which columns need cleaning attention
  • The suggested cleaning steps feed directly into the CSV Doctor or Date Whisperer skills
  • For large datasets, it profiles a sample and notes where sampling might miss issues
Tags:#data-profiling#eda#data-quality#exploration