Gene Expression Nebulas (GEN): a Comprehensive Data Portal Integrating Transcriptomic Profiles Across Multiple Species at Both Bulk and Single-cell Levels

Overview

Journal Nucleic Acids Res

Publisher Oxford University Press

Specialty Biochemistry

Date 2021 Sep 30

PMID 34591957

Citations 18

Authors

Yuansheng Zhang

Dong Zou

Tongtong Zhu

Tianyi Xu

Ming Chen

Guangyi Niu

Wenting Zong

Rong Pan

Wei Jing

Jian Sang

Chang Liu

Yujia Xiong

Yubin Sun

Shuang Zhai

Huanxin Chen

Wenming Zhao

Jingfa Xiao

Yiming Bao

Lili Hao

Zhang Zhang

Affiliations

Soon will be listed here.

Abstract

Transcriptomic profiling is critical to uncovering functional elements from transcriptional and post-transcriptional aspects. Here, we present Gene Expression Nebulas (GEN, https://ngdc.cncb.ac.cn/gen/), an open-access data portal integrating transcriptomic profiles under various biological contexts. GEN features a curated collection of high-quality bulk and single-cell RNA sequencing datasets by using standardized data processing pipelines and a structured curation model. Currently, GEN houses a large number of gene expression profiles from 323 datasets (157 bulk and 166 single-cell), covering 50 500 samples and 15 540 169 cells across 30 species, which are further categorized into six biological contexts. Moreover, GEN integrates a full range of transcriptomic profiles on expression, RNA editing and alternative splicing for 10 bulk datasets, providing opportunities for users to conduct integrative analysis at both transcriptional and post-transcriptional levels. In addition, GEN provides abundant gene annotations based on value-added curation of transcriptomic profiles and delivers online services for data analysis and visualization. Collectively, GEN presents a comprehensive collection of transcriptomic profiles across multiple species, thus serving as a fundamental resource for better understanding genetic regulatory architecture and functional mechanisms from tissues to cells.

Citing Articles

Editome Disease Knowledgebase v2.0: an updated resource of editome-disease associations through literature curation and integrative analysis.

Zhu T, Chu Y, Niu G, Pan R, Chen M, Cheng Y Bioinform Adv. 2025; 5(1):vbaf012.

PMID: 39968378 PMC: 11835235. DOI: 10.1093/bioadv/vbaf012.

CBGDA: a manually curated resource for gene-disease associations based on genome-wide CRISPR.

Du Q, Zhang Z, Yang W, Zhou X, Zhou N, Wu C Database (Oxford). 2024; 2024.

PMID: 39213392 PMC: 11363955. DOI: 10.1093/database/baae077.

mosaicMPI: a framework for modular data integration across cohorts and -omics modalities.

Verhey T, Seo H, Gillmor A, Thoppey-Manoharan V, Schriemer D, Morrissy S Nucleic Acids Res. 2024; 52(12):e53.

PMID: 38813827 PMC: 11229337. DOI: 10.1093/nar/gkae442.

DeepFGRN: inference of gene regulatory network with regulation type based on directed graph embedding.

Gao Z, Su Y, Xia J, Cao R, Ding Y, Zheng C Brief Bioinform. 2024; 25(3).

PMID: 38581416 PMC: 10998536. DOI: 10.1093/bib/bbae143.

Plant genomic resources at National Genomics Data Center: assisting in data-driven breeding applications.

Tian D, Xu T, Kang H, Luo H, Wang Y, Chen M aBIOTECH. 2024; 5(1):94-106.

PMID: 38576435 PMC: 10987443. DOI: 10.1007/s42994-023-00134-4.

References

Kodama Y, Shumway M, Leinonen R . The Sequence Read Archive: explosive growth of sequencing data. Nucleic Acids Res. 2011; 40(Database issue):D54-6. PMC: 3245110. DOI: 10.1093/nar/gkr854. View

Ritchie M, Phipson B, Wu D, Hu Y, Law C, Shi W . limma powers differential expression analyses for RNA-sequencing and microarray studies. Nucleic Acids Res. 2015; 43(7):e47. PMC: 4402510. DOI: 10.1093/nar/gkv007. View

Buels R, Yao E, Diesh C, Hayes R, Munoz-Torres M, Helt G . JBrowse: a dynamic web platform for genome visualization and analysis. Genome Biol. 2016; 17:66. PMC: 4830012. DOI: 10.1186/s13059-016-0924-1. View

Chen T, Chen X, Zhang S, Zhu J, Tang B, Wang A . The Genome Sequence Archive Family: Toward Explosive Data Growth and Diverse Data Types. Genomics Proteomics Bioinformatics. 2021; 19(4):578-583. PMC: 9039563. DOI: 10.1016/j.gpb.2021.08.001. View

Li H, Handsaker B, Wysoker A, Fennell T, Ruan J, Homer N . The Sequence Alignment/Map format and SAMtools. Bioinformatics. 2009; 25(16):2078-9. PMC: 2723002. DOI: 10.1093/bioinformatics/btp352. View

Luecken M, Theis F . Current best practices in single-cell RNA-seq analysis: a tutorial. Mol Syst Biol. 2019; 15(6):e8746. PMC: 6582955. DOI: 10.15252/msb.20188746. View

Klein A, Mazutis L, Akartuna I, Tallapragada N, Veres A, Li V . Droplet barcoding for single-cell transcriptomics applied to embryonic stem cells. Cell. 2015; 161(5):1187-1201. PMC: 4441768. DOI: 10.1016/j.cell.2015.04.044. View

Pertea M, Kim D, Pertea G, Leek J, Salzberg S . Transcript-level expression analysis of RNA-seq experiments with HISAT, StringTie and Ballgown. Nat Protoc. 2016; 11(9):1650-67. PMC: 5032908. DOI: 10.1038/nprot.2016.095. View

Schmiedel B, Singh D, Madrigal A, Valdovino-Gonzalez A, White B, Zapardiel-Gonzalo J . Impact of Genetic Polymorphisms on Human Immune Cell Gene Expression. Cell. 2018; 175(6):1701-1715.e16. PMC: 6289654. DOI: 10.1016/j.cell.2018.10.022. View

10.

Bhadauria V, Popescu L, Zhao W, Peng Y . Fungal transcriptomics. Microbiol Res. 2007; 162(4):285-98. DOI: 10.1016/j.micres.2007.06.006. View

11.

Aran D, Looney A, Liu L, Wu E, Fong V, Hsu A . Reference-based analysis of lung single-cell sequencing reveals a transitional profibrotic macrophage. Nat Immunol. 2019; 20(2):163-172. PMC: 6340744. DOI: 10.1038/s41590-018-0276-y. View

12.

Wang L, Wang S, Li W . RSeQC: quality control of RNA-seq experiments. Bioinformatics. 2012; 28(16):2184-5. DOI: 10.1093/bioinformatics/bts356. View

13.

Regev A, Teichmann S, Lander E, Amit I, Benoist C, Birney E . The Human Cell Atlas. Elife. 2017; 6. PMC: 5762154. DOI: 10.7554/eLife.27041. View

14.

Macosko E, Basu A, Satija R, Nemesh J, Shekhar K, Goldman M . Highly Parallel Genome-wide Expression Profiling of Individual Cells Using Nanoliter Droplets. Cell. 2015; 161(5):1202-1214. PMC: 4481139. DOI: 10.1016/j.cell.2015.05.002. View

15.

Yang X, Kui L, Tang M, Li D, Wei K, Chen W . High-Throughput Transcriptome Profiling in Drug and Biomarker Discovery. Front Genet. 2020; 11:19. PMC: 7013098. DOI: 10.3389/fgene.2020.00019. View

16.

Wu T, Hu E, Xu S, Chen M, Guo P, Dai Z . clusterProfiler 4.0: A universal enrichment tool for interpreting omics data. Innovation (Camb). 2021; 2(3):100141. PMC: 8454663. DOI: 10.1016/j.xinn.2021.100141. View

17.

Safran M, Dalah I, Alexander J, Rosen N, Iny Stein T, Shmoish M . GeneCards Version 3: the human gene integrator. Database (Oxford). 2010; 2010:baq020. PMC: 2938269. DOI: 10.1093/database/baq020. View

18.

Stubbington M, Rozenblatt-Rosen O, Regev A, Teichmann S . Single-cell transcriptomics to explore the immune system in health and disease. Science. 2017; 358(6359):58-63. PMC: 5654495. DOI: 10.1126/science.aan6828. View

19.

Kodama Y, Mashima J, Kaminuma E, Gojobori T, Ogasawara O, Takagi T . The DNA Data Bank of Japan launches a new resource, the DDBJ Omics Archive of functional genomics experiments. Nucleic Acids Res. 2011; 40(Database issue):D38-42. PMC: 3244990. DOI: 10.1093/nar/gkr994. View

20.

Hillje R, Pelicci P, Luzi L . Cerebro: interactive visualization of scRNA-seq data. Bioinformatics. 2019; 36(7):2311-2313. PMC: 7141853. DOI: 10.1093/bioinformatics/btz877. View