Back to Cookbook

NCBI SRA Download & FASTQ Extraction Optimizer

Make prefetch + fasterq-dump reliable and resource-aware.

Reduce hangs, excessive RAM, and disk blowups when pulling sequencing reads from SRA by using recommended prefetch workflows, scratch temp dirs, and size-check awareness.

CommunitySubmitted by CommunityWork15 min

INGREDIENTS

🐙GitHub🔍Web

PROMPT

You are OpenClaw. Ask for accession IDs, filesystem constraints (home/scratch sizes), and the exact sra-tools commands used. Then propose an optimized prefetch + fasterq-dump workflow including temp-dir selection, thread/memory tuning, and output validation checks. Include guardrails (space estimates, vdb-validate integrity checks, and paired-end verification).

Pain point

fastq-dump/fasterq-dump can be extremely slow, memory-hungry, or fail due to insufficient temporary space,

especially on shared filesystems.

Repro/diagnostic steps

  1. Capture tool version and command invocation.
  2. Measure available temp space and output space (scratch vs home).

Root causes (common)

  • Extracting directly on slow/shared FS (I/O bottlenecks).
  • Temporary files larger than expected; insufficient scratch.
  • Multithreading and buffering causing high RAM usage.

Fix workflow

  1. Use prefetch to stage data reliably before extraction.
  2. Use fasterq-dump with an explicit fast temp directory (scratch/ram-disk when appropriate).
  3. Validate outputs (paired reads, read counts) before deleting intermediates.

Expected result

  • Predictable extraction times; no surprise OOM or disk-limit failures.

References

  • https://github.com/ncbi/sra-tools/issues/24
  • https://github.com/ncbi/sra-tools/issues/424
  • https://github.com/ncbi/sra-tools/wiki/HowTo%3A-fasterq-dump
  • https://github.com/ncbi/sra-tools/wiki/08.-prefetch-and-fasterq-dump
Tags:#bioinformatics#data-transfer#hpc#sequencing