Backup Prober
Find out your backups are empty before you need them, not during the incident
Verifies your backups are actually restorable — checks completeness, freshness, and integrity. Runs automated restore tests to catch the silent failures that make backup confidence an illusion.
INGREDIENTS
PROMPT
Create a skill called "Backup Prober". Verify backup health across my infrastructure: 1. Inventory all backup systems: - AWS RDS snapshots: `aws rds describe-db-snapshots` - AWS Backup: `aws backup list-backup-jobs` - Kubernetes Velero: `velero backup get` - Any custom backup scripts (check cron logs) 2. For each backup system, verify: - Last successful backup time (is it within the expected RPO?) - Backup size (is it reasonable? Has it changed dramatically?) - Backup completeness (are all expected databases/volumes included?) 3. If possible, run a restore test: - Restore to a test instance - Run basic data verification (row counts, key table spot checks) - Clean up test resources after verification 4. Generate a DR readiness report: - Actual RPO (time since last good backup) - Estimated RTO (based on backup size and restore process) - Gaps: what's not being backed up that should be Flag any backup system outside its expected RPO as CRITICAL. If I don't provide an RPO target, default to 24 hours.
How It Works
GitLab found out 5 out of 5 backup methods had failed during their 2017
database outage. This skill makes sure that doesn't happen to you by
regularly verifying backup health.
What You Get
- Backup inventory: all backup jobs, schedules, and last run status
- Freshness check: when was the last successful backup?
- Completeness check: is the backup size reasonable vs. data size?
- Integrity check: checksums, consistency verification
- Restore test: automated restore to a test instance with data verification
- DR readiness report: RPO/RTO assessment based on actual backup state
Setup Steps
- Tell your Claw about your backup systems (AWS Backup, RDS snapshots, Velero, custom scripts)
- Run the verification to get a baseline health report
- Set up a weekly or monthly schedule for ongoing verification
- Address any failures found in the report
Tips
- A backup that hasn't been restore-tested is a hope, not a backup
- Check backup sizes over time — a sudden size drop often means something broke
- Verify cross-region or cross-account backups separately
- Include both database and infrastructure state (Terraform, K8s) in your backup scope
- Run the restore test in an isolated environment to avoid impacting production