» Articles » PMID: 12529305

Leveraging the Mouse Genome for Gene Prediction in Human: from Whole-genome Shotgun Reads to a Global Synteny Map

Overview
Journal Genome Res
Specialty Genetics
Date 2003 Jan 17
PMID 12529305
Citations 40
Authors
Affiliations
Soon will be listed here.
Abstract

The availability of draft sequences for both the mouse and human genomes makes it possible, for the first time, to annotate whole mammalian genomes using comparative methods. TWINSCAN is a gene-prediction system that combines the methods of single-genome predictors like GENSCAN with information derived from genome comparison, thereby improving accuracy. Because TWINSCAN uses genomic sequence only, it is less biased toward highly and/or ubiquitously expressed genes than GENEWISE, GENOMESCAN, and other methods based on evidence derived from transcripts. We show that TWINSCAN improves gene prediction in human using intermediate products from various stages of the sequencing and analysis of the mouse genome, from low-redundancy, whole-genome shotgun reads to the draft assembly and the synteny map. TWINSCAN improves on the prior state of the art even when alignments from only 1X coverage of the mouse genome are available. Gene prediction accuracy improves steadily from 1X through 3X, more slowly from 3X to 4X, and relatively little thereafter. The assembly and the synteny map greatly speed the computations, however. Our human annotation using the mouse assembly is conservative, predicting only 25,622 genes, and appears to be one of the best de novo annotations of the human genome to date.

Citing Articles

Chromosome-scale genome assembly of the brown anole (Anolis sagrei), an emerging model species.

Geneva A, Park S, Bock D, de Mello P, Sarigol F, Tollis M Commun Biol. 2022; 5(1):1126.

PMID: 36284162 PMC: 9596491. DOI: 10.1038/s42003-022-04074-5.


Progress, Challenges, and Surprises in Annotating the Human Genome.

Zerbino D, Frankish A, Flicek P Annu Rev Genomics Hum Genet. 2020; 21:55-79.

PMID: 32421357 PMC: 7116059. DOI: 10.1146/annurev-genom-121119-083418.


Whole-Genome Alignment and Comparative Annotation.

Armstrong J, Fiddes I, Diekhans M, Paten B Annu Rev Anim Biosci. 2018; 7:41-64.

PMID: 30379572 PMC: 6450745. DOI: 10.1146/annurev-animal-020518-115005.


Avian W and mammalian Y chromosomes convergently retained dosage-sensitive regulators.

Bellott D, Skaletsky H, Cho T, Brown L, Locke D, Chen N Nat Genet. 2017; 49(3):387-394.

PMID: 28135246 PMC: 5359078. DOI: 10.1038/ng.3778.


The genome and transcriptome of the enteric parasite Entamoeba invadens, a model for encystation.

Ehrenkaufer G, Weedall G, Williams D, Lorenzi H, Caler E, Hall N Genome Biol. 2013; 14(7):R77.

PMID: 23889909 PMC: 4053983. DOI: 10.1186/gb-2013-14-7-r77.


References
1.
Makalowski W, Zhang J, Boguski M . Comparative analysis of 1196 orthologous mouse and human full-length mRNA and protein sequences. Genome Res. 1996; 6(9):846-57. DOI: 10.1101/gr.6.9.846. View

2.
Oeltjen J, Malley T, Muzny D, Miller W, Gibbs R, Belmont J . Large-scale comparative sequence analysis of the human and murine Bruton's tyrosine kinase loci reveals conserved regulatory domains. Genome Res. 1997; 7(4):315-29. DOI: 10.1101/gr.7.4.315. View

3.
Burge C, Karlin S . Prediction of complete gene structures in human genomic DNA. J Mol Biol. 1997; 268(1):78-94. DOI: 10.1006/jmbi.1997.0951. View

4.
Ansari-Lari M, Oeltjen J, Schwartz S, Zhang Z, Muzny D, Lu J . Comparative sequence analysis of a gene-rich cluster at human chromosome 12p13 and its syntenic region in mouse chromosome 6. Genome Res. 1998; 8(1):29-40. View

5.
Makalowski W, Boguski M . Evolutionary parameters of the transcribed mammalian genome: an analysis of 2,820 orthologous rodent and human sequences. Proc Natl Acad Sci U S A. 1998; 95(16):9407-12. PMC: 21351. DOI: 10.1073/pnas.95.16.9407. View