Back to Cookbook

Backup Prober

Find out your backups are empty before you need them, not during the incident

Verifies your backups are actually restorable — checks completeness, freshness, and integrity. Runs automated restore tests to catch the silent failures that make backup confidence an illusion.

House RecipeWork5 min

INGREDIENTS

💬Slack✈️Telegram

PROMPT

Create a skill called "Backup Prober". Verify backup health across my infrastructure: 1. Inventory all backup systems: - AWS RDS snapshots: `aws rds describe-db-snapshots` - AWS Backup: `aws backup list-backup-jobs` - Kubernetes Velero: `velero backup get` - Any custom backup scripts (check cron logs) 2. For each backup system, verify: - Last successful backup time (is it within the expected RPO?) - Backup size (is it reasonable? Has it changed dramatically?) - Backup completeness (are all expected databases/volumes included?) 3. If possible, run a restore test: - Restore to a test instance - Run basic data verification (row counts, key table spot checks) - Clean up test resources after verification 4. Generate a DR readiness report: - Actual RPO (time since last good backup) - Estimated RTO (based on backup size and restore process) - Gaps: what's not being backed up that should be Flag any backup system outside its expected RPO as CRITICAL. If I don't provide an RPO target, default to 24 hours.

How It Works

GitLab found out 5 out of 5 backup methods had failed during their 2017

database outage. This skill makes sure that doesn't happen to you by

regularly verifying backup health.

What You Get

  • Backup inventory: all backup jobs, schedules, and last run status
  • Freshness check: when was the last successful backup?
  • Completeness check: is the backup size reasonable vs. data size?
  • Integrity check: checksums, consistency verification
  • Restore test: automated restore to a test instance with data verification
  • DR readiness report: RPO/RTO assessment based on actual backup state

Setup Steps

  1. Tell your Claw about your backup systems (AWS Backup, RDS snapshots, Velero, custom scripts)
  2. Run the verification to get a baseline health report
  3. Set up a weekly or monthly schedule for ongoing verification
  4. Address any failures found in the report

Tips

  • A backup that hasn't been restore-tested is a hope, not a backup
  • Check backup sizes over time — a sudden size drop often means something broke
  • Verify cross-region or cross-account backups separately
  • Include both database and infrastructure state (Terraform, K8s) in your backup scope
  • Run the restore test in an isolated environment to avoid impacting production
Tags:#backups#disaster-recovery#reliability#devops