» Articles » PMID: 27016733

Coding Exon-structure Aware Realigner (CESAR) Utilizes Genome Alignments for Accurate Comparative Gene Annotation

Overview
Specialty Biochemistry
Date 2016 Mar 27
PMID 27016733
Citations 29
Authors
Affiliations
Soon will be listed here.
Abstract

Identifying coding genes is an essential step in genome annotation. Here, we utilize existing whole genome alignments to detect conserved coding exons and then map gene annotations from one genome to many aligned genomes. We show that genome alignments contain thousands of spurious frameshifts and splice site mutations in exons that are truly conserved. To overcome these limitations, we have developed CESAR (Coding Exon-Structure Aware Realigner) that realigns coding exons, while considering reading frame and splice sites of each exon. CESAR effectively avoids spurious frameshifts in conserved genes and detects 91% of shifted splice sites. This results in the identification of thousands of additional conserved exons and 99% of the exons that lack inactivating mutations match real exons. Finally, to demonstrate the potential of using CESAR for comparative gene annotation, we applied it to 188 788 exons of 19 865 human genes to annotate human genes in 99 other vertebrates. These comparative gene annotations are available as a resource (http://bds.mpi-cbg.de/hillerlab/CESAR/). CESAR (https://github.com/hillerlab/CESAR/) can readily be applied to other alignments to accurately annotate coding genes in many other vertebrate and invertebrate genomes.

Citing Articles

Conservation assessment of human splice site annotation based on a 470-genome alignment.

Minkin I, Salzberg S, Salzberg S bioRxiv. 2023; .

PMID: 38076842 PMC: 10705407. DOI: 10.1101/2023.12.01.569581.


High-quality haploid genomes corroborate 29 chromosomes and highly conserved synteny of genes in Hyles hawkmoths (Lepidoptera: Sphingidae).

Hundsdoerfer A, Schell T, Patzold F, Wright C, Yoshido A, Marec F BMC Genomics. 2023; 24(1):443.

PMID: 37550607 PMC: 10405479. DOI: 10.1186/s12864-023-09506-y.


ncOrtho: efficient and reliable identification of miRNA orthologs.

Langschied F, Leisegang M, Brandes R, Ebersberger I Nucleic Acids Res. 2023; 51(13):e71.

PMID: 37260093 PMC: 10359484. DOI: 10.1093/nar/gkad467.


Integrating gene annotation with orthology inference at scale.

Kirilenko B, Munegowda C, Osipova E, Jebb D, Sharma V, Blumer M Science. 2023; 380(6643):eabn3107.

PMID: 37104600 PMC: 10193443. DOI: 10.1126/science.abn3107.


Building the Chordata Olfactory Receptor Database using more than 400,000 receptors annotated by Genome2OR.

Han W, Wu Y, Zeng L, Zhao S Sci China Life Sci. 2022; 65(12):2539-2551.

PMID: 35696018 DOI: 10.1007/s11427-021-2081-6.


References
1.
Curwen V, Eyras E, Andrews T, Clarke L, Mongin E, Searle S . The Ensembl automatic gene annotation system. Genome Res. 2004; 14(5):942-50. PMC: 479124. DOI: 10.1101/gr.1858004. View

2.
Lindblad-Toh K, Garber M, Zuk O, Lin M, Parker B, Washietl S . A high-resolution map of human evolutionary constraint using 29 mammals. Nature. 2011; 478(7370):476-82. PMC: 3207357. DOI: 10.1038/nature10530. View

3.
. An integrated encyclopedia of DNA elements in the human genome. Nature. 2012; 489(7414):57-74. PMC: 3439153. DOI: 10.1038/nature11247. View

4.
Lu D, Brown R, Arumugam M, Brent M . Pairagon: a highly accurate, HMM-based cDNA-to-genome aligner. Bioinformatics. 2009; 25(13):1587-93. PMC: 2732315. DOI: 10.1093/bioinformatics/btp273. View

5.
Speir M, Zweig A, Rosenbloom K, Raney B, Paten B, Nejad P . The UCSC Genome Browser database: 2016 update. Nucleic Acids Res. 2015; 44(D1):D717-25. PMC: 4702902. DOI: 10.1093/nar/gkv1275. View