» Articles » PMID: 33599246

Curation of over 10 000 Transcriptomic Studies to Enable Data Reuse

Overview
Specialty Biology
Date 2021 Feb 18
PMID 33599246
Citations 21
Authors
Affiliations
Soon will be listed here.
Abstract

Vast amounts of transcriptomic data reside in public repositories, but effective reuse remains challenging. Issues include unstructured dataset metadata, inconsistent data processing and quality control, and inconsistent probe-gene mappings across microarray technologies. Thus, extensive curation and data reprocessing are necessary prior to any reuse. The Gemma bioinformatics system was created to help address these issues. Gemma consists of a database of curated transcriptomic datasets, analytical software, a web interface and web services. Here we present an update on Gemma's holdings, data processing and analysis pipelines, our curation guidelines, and software features. As of June 2020, Gemma contains 10 811 manually curated datasets (primarily human, mouse and rat), over 395 000 samples and hundreds of curated transcriptomic platforms (both microarray and RNA sequencing). Dataset topics were represented with 10 215 distinct terms from 12 ontologies, for a total of 54 316 topic annotations (mean topics/dataset = 5.2). While Gemma has broad coverage of conditions and tissues, it captures a large majority of available brain-related datasets, accounting for 34% of its holdings. Users can access the curated data and differential expression analyses through the Gemma website, RESTful service and an R package. Database URL: https://gemma.msl.ubc.ca/home.html.

Citing Articles

Prenatal gene-environment interactions mediate the impact of advanced maternal age on mouse offspring behavior.

Zietek M, Jaszczyk A, Stankiewicz A, Sampino S Sci Rep. 2024; 14(1):31733.

PMID: 39738558 PMC: 11685589. DOI: 10.1038/s41598-024-82070-x.


Annotating publicly-available samples and studies using interpretable modeling of unstructured metadata.

Yuan H, Hicks P, Ahmadian M, Johnson K, Valtadoros L, Krishnan A Brief Bioinform. 2024; 26(1).

PMID: 39710433 PMC: 11663484. DOI: 10.1093/bib/bbae652.


A meta-analysis of the effects of early life stress on the prefrontal cortex transcriptome suggests long-term effects on myelin.

Duan T, Hagenauer M, Flandreau E, Bader A, Nguyen D, Maras P bioRxiv. 2024; .

PMID: 39605735 PMC: 11601536. DOI: 10.1101/2024.11.22.624315.


Resource: A curated database of brain-related functional gene sets (Brain.GMT).

Hagenauer M, Sannah Y, Hebda-Bauer E, Rhoads C, OConnor A, Flandreau E MethodsX. 2024; 13:102788.

PMID: 39049932 PMC: 11267058. DOI: 10.1016/j.mex.2024.102788.


approaches for drug repurposing in oncology: a scoping review.

Cavalcante B, Freitas R, Siquara da Rocha L, Santos R, Souza B, Ramos P Front Pharmacol. 2024; 15:1400029.

PMID: 38919258 PMC: 11196849. DOI: 10.3389/fphar.2024.1400029.


References
1.
Godbout J, Chen J, Abraham J, Richwine A, Berg B, Kelley K . Exaggerated neuroinflammation and sickness behavior in aged mice following activation of the peripheral innate immune system. FASEB J. 2005; 19(10):1329-31. DOI: 10.1096/fj.05-3776fje. View

2.
Carvalho C, Santos R, Cardoso S, Correia S, Oliveira P, Santos M . Doxorubicin: the good, the bad and the ugly effect. Curr Med Chem. 2009; 16(25):3267-85. DOI: 10.2174/092986709788803312. View

3.
Johnson W, Li C, Rabinovic A . Adjusting batch effects in microarray expression data using empirical Bayes methods. Biostatistics. 2006; 8(1):118-27. DOI: 10.1093/biostatistics/kxj037. View

4.
Parker G, Pederson B, Obayashi M, Schroeder J, Harris R, Roach P . Gene expression profiling of mice with genetically modified muscle glycogen content. Biochem J. 2005; 395(1):137-45. PMC: 1409698. DOI: 10.1042/BJ20051456. View

5.
Irizarry R, Hobbs B, Collin F, Beazer-Barclay Y, Antonellis K, Scherf U . Exploration, normalization, and summaries of high density oligonucleotide array probe level data. Biostatistics. 2003; 4(2):249-64. DOI: 10.1093/biostatistics/4.2.249. View