KiloClawPowered by OpenClaw

Back to Cookbook

OpenClaw recipe

NCBI SRA Download & FASTQ Extraction Optimizer

Make prefetch + fasterq-dump reliable and resource-aware.

Reduce hangs, excessive RAM, and disk blowups when pulling sequencing reads from SRA by using recommended prefetch workflows, scratch temp dirs, and size-check awareness.

CommunitySubmitted by CommunityWork15 min

Try in KiloClawFree 7-day trial

INGREDIENTS

🐙GitHub🔍Web

PROMPT

You are OpenClaw. Ask for accession IDs, filesystem constraints (home/scratch sizes), and the exact sra-tools commands used. Then propose an optimized prefetch + fasterq-dump workflow including temp-dir selection, thread/memory tuning, and output validation checks. Include guardrails (space estimates, vdb-validate integrity checks, and paired-end verification).

Pain point

fastq-dump/fasterq-dump can be extremely slow, memory-hungry, or fail due to insufficient temporary space,

especially on shared filesystems.

Repro/diagnostic steps

Capture tool version and command invocation.
Measure available temp space and output space (scratch vs home).

Root causes (common)

Extracting directly on slow/shared FS (I/O bottlenecks).
Temporary files larger than expected; insufficient scratch.
Multithreading and buffering causing high RAM usage.

Fix workflow

Use prefetch to stage data reliably before extraction.
Use fasterq-dump with an explicit fast temp directory (scratch/ram-disk when appropriate).
Validate outputs (paired reads, read counts) before deleting intermediates.

Expected result

Predictable extraction times; no surprise OOM or disk-limit failures.

References

https://github.com/ncbi/sra-tools/issues/24
https://github.com/ncbi/sra-tools/issues/424
https://github.com/ncbi/sra-tools/wiki/HowTo%3A-fasterq-dump
https://github.com/ncbi/sra-tools/wiki/08.-prefetch-and-fasterq-dump

Tags:#bioinformatics#data-transfer#hpc#sequencing

Related Recipes

Conda Solver Triage & Environment Stabilization

Turn "Solving environment…" hangs into a deterministic fix workflow.

Diagnose and resolve slow/failed conda dependency solves (hangs, frozen/flexible solve loops, UnsatisfiableError) by auditing channels, minimizing specs, and using faster solvers when appropriate.

Chromosome Naming Harmonizer for Genomics Pipelines

Prevent silent "no overlap" results from chr/contig naming mismatches.

Detect and fix mismatched chromosome naming conventions (UCSC chr1 vs Ensembl 1, NCBI accessions, etc.) across FASTA/BAM/BED/GTF/VCF inputs.

Source Hunter

Real sources, named experts, actual quotes

Deep research that finds primary sources with named individuals, community sentiment from Reddit/HN/X, and news coverage. No summaries of summaries — actual quotes with URLs.

CLAWBITE AI

Local-first AI assistant that automates small daily tasks safely on your device

A personal, local-first AI assistant that automates small daily tasks—organizing files, setting reminders, and monitoring system events—without touching sensitive data or taking risky actions without your approval.