» Articles » PMID: 34591957

Gene Expression Nebulas (GEN): a Comprehensive Data Portal Integrating Transcriptomic Profiles Across Multiple Species at Both Bulk and Single-cell Levels

Abstract

Transcriptomic profiling is critical to uncovering functional elements from transcriptional and post-transcriptional aspects. Here, we present Gene Expression Nebulas (GEN, https://ngdc.cncb.ac.cn/gen/), an open-access data portal integrating transcriptomic profiles under various biological contexts. GEN features a curated collection of high-quality bulk and single-cell RNA sequencing datasets by using standardized data processing pipelines and a structured curation model. Currently, GEN houses a large number of gene expression profiles from 323 datasets (157 bulk and 166 single-cell), covering 50 500 samples and 15 540 169 cells across 30 species, which are further categorized into six biological contexts. Moreover, GEN integrates a full range of transcriptomic profiles on expression, RNA editing and alternative splicing for 10 bulk datasets, providing opportunities for users to conduct integrative analysis at both transcriptional and post-transcriptional levels. In addition, GEN provides abundant gene annotations based on value-added curation of transcriptomic profiles and delivers online services for data analysis and visualization. Collectively, GEN presents a comprehensive collection of transcriptomic profiles across multiple species, thus serving as a fundamental resource for better understanding genetic regulatory architecture and functional mechanisms from tissues to cells.

Citing Articles

Editome Disease Knowledgebase v2.0: an updated resource of editome-disease associations through literature curation and integrative analysis.

Zhu T, Chu Y, Niu G, Pan R, Chen M, Cheng Y Bioinform Adv. 2025; 5(1):vbaf012.

PMID: 39968378 PMC: 11835235. DOI: 10.1093/bioadv/vbaf012.


CBGDA: a manually curated resource for gene-disease associations based on genome-wide CRISPR.

Du Q, Zhang Z, Yang W, Zhou X, Zhou N, Wu C Database (Oxford). 2024; 2024.

PMID: 39213392 PMC: 11363955. DOI: 10.1093/database/baae077.


mosaicMPI: a framework for modular data integration across cohorts and -omics modalities.

Verhey T, Seo H, Gillmor A, Thoppey-Manoharan V, Schriemer D, Morrissy S Nucleic Acids Res. 2024; 52(12):e53.

PMID: 38813827 PMC: 11229337. DOI: 10.1093/nar/gkae442.


DeepFGRN: inference of gene regulatory network with regulation type based on directed graph embedding.

Gao Z, Su Y, Xia J, Cao R, Ding Y, Zheng C Brief Bioinform. 2024; 25(3).

PMID: 38581416 PMC: 10998536. DOI: 10.1093/bib/bbae143.


Plant genomic resources at National Genomics Data Center: assisting in data-driven breeding applications.

Tian D, Xu T, Kang H, Luo H, Wang Y, Chen M aBIOTECH. 2024; 5(1):94-106.

PMID: 38576435 PMC: 10987443. DOI: 10.1007/s42994-023-00134-4.


References
1.
Kodama Y, Shumway M, Leinonen R . The Sequence Read Archive: explosive growth of sequencing data. Nucleic Acids Res. 2011; 40(Database issue):D54-6. PMC: 3245110. DOI: 10.1093/nar/gkr854. View

2.
Ritchie M, Phipson B, Wu D, Hu Y, Law C, Shi W . limma powers differential expression analyses for RNA-sequencing and microarray studies. Nucleic Acids Res. 2015; 43(7):e47. PMC: 4402510. DOI: 10.1093/nar/gkv007. View

3.
Buels R, Yao E, Diesh C, Hayes R, Munoz-Torres M, Helt G . JBrowse: a dynamic web platform for genome visualization and analysis. Genome Biol. 2016; 17:66. PMC: 4830012. DOI: 10.1186/s13059-016-0924-1. View

4.
Chen T, Chen X, Zhang S, Zhu J, Tang B, Wang A . The Genome Sequence Archive Family: Toward Explosive Data Growth and Diverse Data Types. Genomics Proteomics Bioinformatics. 2021; 19(4):578-583. PMC: 9039563. DOI: 10.1016/j.gpb.2021.08.001. View

5.
Li H, Handsaker B, Wysoker A, Fennell T, Ruan J, Homer N . The Sequence Alignment/Map format and SAMtools. Bioinformatics. 2009; 25(16):2078-9. PMC: 2723002. DOI: 10.1093/bioinformatics/btp352. View