» Articles » PMID: 39447029

ISeq: an Integrated Tool to Fetch Public Sequencing Data

Overview
Journal Bioinformatics
Specialty Biology
Date 2024 Oct 24
PMID 39447029
Authors
Affiliations
Soon will be listed here.
Abstract

Motivation: High-throughput sequencing technologies [next-generation sequencing (NGS)] are increasingly used to address diverse biological questions. Despite the rich information in NGS data, particularly with the growing datasets from repositories like the Genome Sequence Archive (GSA) at NGDC, programmatic access to public sequencing data and metadata remains limited.

Results: We developed iSeq to enable quick and straightforward retrieval of metadata and NGS data from multiple databases via the command-line interface. iSeq supports simultaneous retrieval from GSA, SRA, ENA, and DDBJ databases. It handles over 25 different accession formats, supports Aspera downloads, parallel downloads, multi-threaded processes, FASTQ file merging, and integrity verification, simplifying data acquisition and enhancing the capacity for reanalyzing NGS data.

Availability And Implementation: iSeq is freely available on Bioconda (https://anaconda.org/bioconda/iseq) and GitHub (https://github.com/BioOmics/iSeq).

References
1.
Horak P, Frohling S, Glimm H . Integrating next-generation sequencing into clinical oncology: strategies, promises and pitfalls. ESMO Open. 2016; 1(5):e000094. PMC: 5133384. DOI: 10.1136/esmoopen-2016-000094. View

2.
Ewels P, Peltzer A, Fillinger S, Patel H, Alneberg J, Wilm A . The nf-core framework for community-curated bioinformatics pipelines. Nat Biotechnol. 2020; 38(3):276-278. DOI: 10.1038/s41587-020-0439-x. View

3.
Arita M, Karsch-Mizrachi I, Cochrane G . The international nucleotide sequence database collaboration. Nucleic Acids Res. 2020; 49(D1):D121-D124. PMC: 7778961. DOI: 10.1093/nar/gkaa967. View

4.
Choudhary S . pysradb: A Python package to query next-generation sequencing metadata and data from NCBI Sequence Read Archive. F1000Res. 2019; 8:532. PMC: 6505635. DOI: 10.12688/f1000research.18676.1. View

5.
Galvez-Merchan A, Min K, Pachter L, Booeshaghi A . Metadata retrieval from sequence databases with ffq. Bioinformatics. 2023; 39(1). PMC: 9883619. DOI: 10.1093/bioinformatics/btac667. View