» Articles » PMID: 36610997

Metadata Retrieval from Sequence Databases with Ffq

Overview
Journal Bioinformatics
Specialty Biology
Date 2023 Jan 7
PMID 36610997
Authors
Affiliations
Soon will be listed here.
Abstract

Motivation: Several genomic databases host data and metadata for an ever-growing collection of sequence datasets. While these databases have a shared hierarchical structure, there are no tools specifically designed to leverage it for metadata extraction.

Results: We present a command-line tool, called ffq, for querying user-generated data and metadata from sequence databases. Given an accession or a paper's DOI, ffq efficiently fetches metadata and links to raw data in JSON format. ffq's modularity and simplicity make it extensible to any genomic database exposing its data for programmatic access.

Availability And Implementation: ffq is free and open source, and the code can be found here: https://github.com/pachterlab/ffq.

Citing Articles

iSeq: an integrated tool to fetch public sequencing data.

Chao H, Li Z, Chen D, Chen M Bioinformatics. 2024; 40(11).

PMID: 39447029 PMC: 11561040. DOI: 10.1093/bioinformatics/btae641.


Meta-analysis of the Microbial Diversity Cultured in Bioreactors Simulating the Gut Microbiome.

Mendez D, Egan S, Wist J, Holmes E, Sanabria J Microb Ecol. 2024; 87(1):57.

PMID: 38587527 PMC: 11001690. DOI: 10.1007/s00248-024-02369-0.


GINSA: an accumulator for paired locality and next-generation small ribosomal subunit sequence data.

Odle E, Kahng S, Riewluang S, Kurihara K, Wakeman K Bioinformatics. 2024; 40(4).

PMID: 38502961 PMC: 10987208. DOI: 10.1093/bioinformatics/btae152.


Quantifying orthogonal barcodes for sequence census assays.

Booeshaghi A, Min K, Gehring J, Pachter L Bioinform Adv. 2024; 4(1):vbad181.

PMID: 38213823 PMC: 10783946. DOI: 10.1093/bioadv/vbad181.


Efficient and accurate detection of viral sequences at single-cell resolution reveals putative novel viruses perturbing host gene expression.

Luebbert L, Sullivan D, Carilli M, Hjorleifsson K, Viloria Winnett A, Chari T bioRxiv. 2024; .

PMID: 38168363 PMC: 10760059. DOI: 10.1101/2023.12.11.571168.


References
1.
. Database Resources of the National Genomics Data Center, China National Center for Bioinformation in 2022. Nucleic Acids Res. 2021; 50(D1):D27-D38. PMC: 8728233. DOI: 10.1093/nar/gkab951. View

2.
Wartmann H, Heins S, Kloiber K, Bonn S . Bias-invariant RNA-sequencing metadata annotation. Gigascience. 2021; 10(9). PMC: 8559615. DOI: 10.1093/gigascience/giab064. View

3.
Luebbert L, Pachter L . Efficient querying of genomic reference databases with gget. Bioinformatics. 2023; 39(1). PMC: 9835474. DOI: 10.1093/bioinformatics/btac836. View

4.
Klie A, Tsui B, Mollah S, Skola D, Dow M, Hsu C . Increasing metadata coverage of SRA BioSample entries using deep learning-based named entity recognition. Database (Oxford). 2021; 2021. PMC: 8083811. DOI: 10.1093/database/baab021. View

5.
Davis C, Hitz B, Sloan C, Chan E, Davidson J, Gabdank I . The Encyclopedia of DNA elements (ENCODE): data portal update. Nucleic Acids Res. 2017; 46(D1):D794-D801. PMC: 5753278. DOI: 10.1093/nar/gkx1081. View