First Look
Know what you're dealing with before writing a single query
Drop any dataset — CSV, Excel, Parquet, database table — and get a complete profile in seconds: types, nulls, distributions, outliers, duplicates, correlations, and quality issues. The EDA you always do manually, automated.
PROMPT
Create a skill called "First Look". When I give you a dataset (CSV, Excel, Parquet, or a database table name), produce a comprehensive data profile: (1) Overview: row count, column count, memory size, duplicate row count. (2) Per-column analysis: inferred type, null count and percentage, unique values count, min/max/mean/median/std for numeric, top 10 values for categorical, example values. (3) Distribution analysis: identify skewed distributions, uniform distributions, and constant columns. (4) Outlier detection: flag values beyond 3 standard deviations or 1.5*IQR. (5) Cross-column correlations for numeric pairs above 0.7. (6) Data quality summary: an overall quality score and per-column scores based on completeness, uniqueness, and consistency. (7) Recommended next steps: specific cleaning actions for each issue found. For large datasets (>1M rows), profile a stratified sample and note the sampling approach.
How It Works
Every new dataset starts the same way: df.head(), df.describe(), df.info(),
check for nulls, check for duplicates, look at value distributions...
This skill does all of it in one pass and gives you a structured report.
What You Get
- Column-level profiling: type, null rate, unique count, min/max/mean/median
- Distribution summaries and histogram sketches for numeric columns
- Top values and frequency counts for categorical columns
- Duplicate row detection (exact and fuzzy)
- Outlier flagging (values beyond 3 standard deviations or IQR-based)
- Cross-column correlation matrix for numeric columns
- Data quality score per column and overall
- Suggested cleaning steps based on issues found
Setup Steps
- Ask your Claw to create a "First Look" skill with the prompt below
- Drop any data file or point it at a database table
- Get back a comprehensive profile report in under a minute
Tips
- Run this on every new data source before doing any analysis
- The quality score helps prioritize which columns need cleaning attention
- The suggested cleaning steps feed directly into the CSV Doctor or Date Whisperer skills
- For large datasets, it profiles a sample and notes where sampling might miss issues