» Articles » PMID: 23431329

Incorporating the Human Gene Annotations in Different Databases Significantly Improved Transcriptomic and Genetic Analyses

Overview
Journal RNA
Specialty Molecular Biology
Date 2013 Feb 23
PMID 23431329
Citations 15
Authors
Affiliations
Soon will be listed here.
Abstract

Human gene annotation is crucial for conducting transcriptomic and genetic studies; however, the impacts of human gene annotations in diverse databases on related studies have been less evaluated. To enable full use of various human annotation resources and better understand the human transcriptome, here we systematically compare the human annotations present in RefSeq, Ensembl (GENCODE), and AceView on diverse transcriptomic and genetic analyses. We found that the human gene annotations in the three databases are far from complete. Although Ensembl and AceView annotated more genes than RefSeq, more than 15,800 genes from Ensembl (or AceView) are within the intergenic and intronic regions of AceView (or Ensembl) annotation. The human transcriptome annotations in RefSeq, Ensembl, and AceView had distinct effects on short-read mapping, gene and isoform expression profiling, and differential expression calling. Furthermore, our findings indicate that the integrated annotation of these databases can obtain a more complete gene set and significantly enhance those transcriptomic analyses. We also observed that many more known SNPs were located within genes annotated in Ensembl and AceView than in RefSeq. In particular, 1033 of 3041 trait/disease-associated SNPs involved in about 200 human traits/diseases that were previously reported to be in RefSeq intergenic regions could be relocated within Ensembl and AceView genes. Our findings illustrate that a more complete transcriptome generated by incorporating human gene annotations in diverse databases can strikingly improve the overall results of transcriptomic and genetic studies.

Citing Articles

Impact of genome build on RNA-seq interpretation and diagnostics.

Ungar R, Goddard P, Jensen T, Degalez F, Smith K, Jin C Am J Hum Genet. 2024; 111(7):1282-1300.

PMID: 38834072 PMC: 11267525. DOI: 10.1016/j.ajhg.2024.05.005.


Impact of genome build on RNA-seq interpretation and diagnostics.

Ungar R, Goddard P, Jensen T, Degalez F, Smith K, Jin C medRxiv. 2024; .

PMID: 38260490 PMC: 10802764. DOI: 10.1101/2024.01.11.24301165.


Transcript assembly and annotations: Bias and adjustment.

Zhang Q, Shao M PLoS Comput Biol. 2023; 19(12):e1011734.

PMID: 38127855 PMC: 10769104. DOI: 10.1371/journal.pcbi.1011734.


Roadblock: improved annotations do not necessarily translate into new functional insights.

Hall N, Carlyle B, Haerty W, Tunbridge E Genome Biol. 2021; 22(1):320.

PMID: 34809684 PMC: 8607653. DOI: 10.1186/s13059-021-02542-5.


Impact of human gene annotations on RNA-seq differential expression analysis.

Hamaguchi Y, Zeng C, Hamada M BMC Genomics. 2021; 22(1):730.

PMID: 34625021 PMC: 8501603. DOI: 10.1186/s12864-021-08038-7.


References
1.
Garber M, Grabherr M, Guttman M, Trapnell C . Computational methods for transcriptome annotation and quantification using RNA-seq. Nat Methods. 2011; 8(6):469-77. DOI: 10.1038/nmeth.1613. View

2.
Cabili M, Trapnell C, Goff L, Koziol M, Tazon-Vega B, Regev A . Integrative annotation of human large intergenic noncoding RNAs reveals global properties and specific subclasses. Genes Dev. 2011; 25(18):1915-27. PMC: 3185964. DOI: 10.1101/gad.17446611. View

3.
Stanke M, Waack S . Gene prediction with a hidden Markov model and a new intron submodel. Bioinformatics. 2003; 19 Suppl 2:ii215-25. DOI: 10.1093/bioinformatics/btg1080. View

4.
Li R, Li Y, Zheng H, Luo R, Zhu H, Li Q . Building the sequence map of the human pan-genome. Nat Biotechnol. 2009; 28(1):57-63. DOI: 10.1038/nbt.1596. View

5.
Stranger B, Nica A, Forrest M, Dimas A, Bird C, Beazley C . Population genomics of human gene expression. Nat Genet. 2007; 39(10):1217-24. PMC: 2683249. DOI: 10.1038/ng2142. View