» Articles » PMID: 36857584

GEOfetch: a Command-line Tool for Downloading Data and Standardized Metadata from GEO and SRA

Overview
Journal Bioinformatics
Specialty Biology
Date 2023 Mar 1
PMID 36857584
Authors
Affiliations
Soon will be listed here.
Abstract

Motivation: The Gene Expression Omnibus has become an important source of biological data for secondary analysis. However, there is no simple, programmatic way to download data and metadata from Gene Expression Omnibus (GEO) in a standardized annotation format.

Results: To address this, we present GEOfetch-a command-line tool that downloads and organizes data and metadata from GEO and SRA. GEOfetch formats the downloaded metadata as a Portable Encapsulated Project, providing universal format for the reanalysis of public data.

Availability And Implementation: GEOfetch is available on Bioconda and the Python Package Index (PyPI).

Citing Articles

Methods for evaluating unsupervised vector representations of genomic regions.

Zheng G, Rymuza J, Gharavi E, LeRoy N, Zhang A, Sheffield N NAR Genom Bioinform. 2024; 6(3):lqae086.

PMID: 39131817 PMC: 11316252. DOI: 10.1093/nargab/lqae086.


PEPhub: a database, web interface, and API for editing, sharing, and validating biological sample metadata.

LeRoy N, Khoroshevskyi O, OBrien A, Stepien R, Arslan A, Sheffield N Gigascience. 2024; 13.

PMID: 38991851 PMC: 11238423. DOI: 10.1093/gigascience/giae033.


PDL1 targeting by miR-138-5p amplifies anti-tumor immunity and Jurkat cells survival in non-small cell lung cancer.

Rostami F, Tavakol Hamedani Z, Sadoughi A, Mehrabadi M, Kouhkan F Sci Rep. 2024; 14(1):13542.

PMID: 38866824 PMC: 11169246. DOI: 10.1038/s41598-024-62064-5.


OMD Curation Toolkit: a workflow for in-house curation of public omics datasets.

Piquer-Esteban S, Arnau V, Diaz W, Moya A BMC Bioinformatics. 2024; 25(1):184.

PMID: 38724907 PMC: 11084137. DOI: 10.1186/s12859-024-05803-9.


Joint Representation Learning for Retrieval and Annotation of Genomic Interval Sets.

Gharavi E, LeRoy N, Zheng G, Zhang A, Brown D, Sheffield N Bioengineering (Basel). 2024; 11(3).

PMID: 38534537 PMC: 10967841. DOI: 10.3390/bioengineering11030263.


References
1.
Sheffield N, Stolarczyk M, Reuter V, Rendeiro A . Linking big biomedical datasets to modular analysis with Portable Encapsulated Projects. Gigascience. 2021; 10(12). PMC: 8673555. DOI: 10.1093/gigascience/giab077. View

2.
Chen G, Ramirez J, Deng N, Qiu X, Wu C, Zheng W . Restructured GEO: restructuring Gene Expression Omnibus metadata for genome dynamics analysis. Database (Oxford). 2019; 2019. PMC: 6333964. DOI: 10.1093/database/bay145. View

3.
Ewels P, Peltzer A, Fillinger S, Patel H, Alneberg J, Wilm A . The nf-core framework for community-curated bioinformatics pipelines. Nat Biotechnol. 2020; 38(3):276-278. DOI: 10.1038/s41587-020-0439-x. View

4.
Barrett T, Wilhite S, Ledoux P, Evangelista C, Kim I, Tomashevsky M . NCBI GEO: archive for functional genomics data sets--update. Nucleic Acids Res. 2012; 41(Database issue):D991-5. PMC: 3531084. DOI: 10.1093/nar/gks1193. View

5.
Davis S, Meltzer P . GEOquery: a bridge between the Gene Expression Omnibus (GEO) and BioConductor. Bioinformatics. 2007; 23(14):1846-7. DOI: 10.1093/bioinformatics/btm254. View