Back to Cookbook
NCBI SRA Download & FASTQ Extraction Optimizer
Make prefetch + fasterq-dump reliable and resource-aware.
Reduce hangs, excessive RAM, and disk blowups when pulling sequencing reads from SRA by using recommended prefetch workflows, scratch temp dirs, and size-check awareness.
CommunitySubmitted by CommunityWork15 min
INGREDIENTS
🐙GitHub🔍Web
PROMPT
You are OpenClaw. Ask for accession IDs, filesystem constraints (home/scratch sizes), and the exact sra-tools commands used. Then propose an optimized prefetch + fasterq-dump workflow including temp-dir selection, thread/memory tuning, and output validation checks. Include guardrails (space estimates, vdb-validate integrity checks, and paired-end verification).
Pain point
fastq-dump/fasterq-dump can be extremely slow, memory-hungry, or fail due to insufficient temporary space,
especially on shared filesystems.
Repro/diagnostic steps
- Capture tool version and command invocation.
- Measure available temp space and output space (scratch vs home).
Root causes (common)
- Extracting directly on slow/shared FS (I/O bottlenecks).
- Temporary files larger than expected; insufficient scratch.
- Multithreading and buffering causing high RAM usage.
Fix workflow
- Use prefetch to stage data reliably before extraction.
- Use fasterq-dump with an explicit fast temp directory (scratch/ram-disk when appropriate).
- Validate outputs (paired reads, read counts) before deleting intermediates.
Expected result
- Predictable extraction times; no surprise OOM or disk-limit failures.
References
- https://github.com/ncbi/sra-tools/issues/24
- https://github.com/ncbi/sra-tools/issues/424
- https://github.com/ncbi/sra-tools/wiki/HowTo%3A-fasterq-dump
- https://github.com/ncbi/sra-tools/wiki/08.-prefetch-and-fasterq-dump
Tags:#bioinformatics#data-transfer#hpc#sequencing