Know what you're dealing with before writing a single query
Drop any dataset — CSV, Excel, Parquet, database table — and get a complete profile in seconds: types, nulls, distributions, outliers, duplicates, correlations, and quality issues. The EDA you always do manually, automated.
Create a skill called "First Look". When I give you a dataset (CSV, Excel, Parquet, or a database table name), produce a comprehensive data profile: (1) Overview: row count, column count, memory size, duplicate row count. (2) Per-column analysis: inferred type, null count and percentage, unique values count, min/max/mean/median/std for numeric, top 10 values for categorical, example values. (3) Distribution analysis: identify skewed distributions, uniform distributions, and constant columns. (4) Outlier detection: flag values beyond 3 standard deviations or 1.5*IQR. (5) Cross-column correlations for numeric pairs above 0.7. (6) Data quality summary: an overall quality score and per-column scores based on completeness, uniqueness, and consistency. (7) Recommended next steps: specific cleaning actions for each issue found. For large datasets (>1M rows), profile a stratified sample and note the sampling approach.
Every new dataset starts the same way: df.head(), df.describe(), df.info(),
check for nulls, check for duplicates, look at value distributions...
This skill does all of it in one pass and gives you a structured report.
Fix broken CSVs before they break your pipeline
Drop any CSV on your Claw and get a clean, validated file back. It detects encoding issues, mixed delimiters, malformed rows, broken quoting, BOM markers, and type mismatches — then fixes everything and tells you what it changed.
Catch conflicting metric definitions before they reach a meeting
Scans your SQL queries, dbt models, and dashboard definitions to find every place a key metric is calculated. Flags inconsistencies — like marketing counting revenue with refunds and finance counting without — before they cause a trust-destroying meeting.
Wikipedia-grade AI pattern removal
Comprehensive AI writing cleanup based on Wikipedia's WikiProject AI Cleanup guidelines. Catches 24+ distinct patterns including inflated symbolism, em dash overuse, rule of three, copula avoidance, and sycophantic tone.
Real sources, named experts, actual quotes
Deep research that finds primary sources with named individuals, community sentiment from Reddit/HN/X, and news coverage. No summaries of summaries — actual quotes with URLs.