Schema Scout
Find the right table in a warehouse with thousands of them
Ask "where is customer email stored?" and get an answer, not a 50-page data dictionary. Scans your warehouse metadata and, where allowed, profiles tables to build a searchable semantic index so you stop wasting hours hunting for the right data.
PROMPT
Create a skill called "Schema Scout". Connect to my data warehouse and build a searchable catalog. For each table, record: name, schema, row count, column names and types, freshness (last updated), and any existing descriptions or comments. Where access allows, include sample values or profiling stats that help explain what the table contains. Generate plain-English descriptions for columns based on their names and data patterns (e.g., "cust_email" → "Customer email address, VARCHAR, 98% populated"). Build a semantic index so I can ask questions like "Where is customer email stored?" or "Which table has order revenue?" and get direct answers. Detect tables that appear to contain overlapping data and recommend canonical sources. Update the index on a weekly schedule.
How It Works
Large data warehouses have thousands of tables with names like `stg_crm_contacts_v2`
and `dim_customer_legacy_backup`. Nobody knows which customer table is canonical.
This skill builds a semantic index from your warehouse metadata so you can search
in plain English.
What You Get
- A searchable catalog of all tables and columns
- Plain-English descriptions generated from column names, data patterns, and usage
- Duplicate table detection (tables with overlapping data)
- Canonical source recommendations based on freshness, completeness, and usage
- Lineage tracing (which tables feed into which)
- Column-level search ("where is email address stored?")
Setup Steps
- Ask your Claw to create a "Schema Scout" skill with the prompt below
- Connect it to your data warehouse (Snowflake, BigQuery, Postgres, Redshift)
- Let it scan metadata and build the index (takes a few minutes for large warehouses)
- Ask natural-language questions about your data
Tips
- Run the full scan weekly to keep the index fresh
- The duplicate detection is worth its weight in gold for warehouse cleanup
- Share the generated catalog with new team members for faster onboarding
- Works best when combined with dbt docs — it fills in the descriptions dbt is missing