» Articles » PMID: 9390295

Integrating Database Homology in a Probabilistic Gene Structure Model

Overview
Publisher World Scientific
Specialty Biology
Date 1997 Jan 1
PMID 9390295
Citations 11
Authors
Affiliations
Soon will be listed here.
Abstract

We present an improved stochastic model of genes in DNA, and describe a method for integrating database homology into the probabilistic framework. A generalized hidden Markov model (GHMM) describes the grammar of a legal parse of a DNA sequence. Probabilities are estimated for gene features by using dynamic programming to combine information from multiple sensors. We show how matches to homologous sequences from a database can be integrated into the probability estimation by interpreting the likelihood of a sequence in terms of the bit-cost to encode a sequence given a homology match. We also demonstrate how homology matches in protein databases can be exploited to help identify splice sites. Our experiments show significant improvements in the sensitivity and specificity of gene structure identification when these new features are added to our gene-finding system, Genie. Experimental results in tests using a standard set of annotated genes showed that Genie identified 95% of coding nucleotides correctly with a specificity of 91%, and 77% of exons were identified exactly.

Citing Articles

Position-dependent motif characterization using non-negative matrix factorization.

Hutchins L, Murphy S, Singh P, Graber J Bioinformatics. 2008; 24(23):2684-90.

PMID: 18852176 PMC: 2639279. DOI: 10.1093/bioinformatics/btn526.


Automated eukaryotic gene structure annotation using EVidenceModeler and the Program to Assemble Spliced Alignments.

Haas B, Salzberg S, Zhu W, Pertea M, Allen J, Orvis J Genome Biol. 2008; 9(1):R7.

PMID: 18190707 PMC: 2395244. DOI: 10.1186/gb-2008-9-1-r7.


The discovery, positioning and verification of a set of transcription-associated motifs in vertebrates.

Ettwiller L, Paten B, Souren M, Loosli F, Wittbrodt J, Birney E Genome Biol. 2005; 6(12):R104.

PMID: 16356267 PMC: 1414082. DOI: 10.1186/gb-2005-6-12-r104.


Gene identification in novel eukaryotic genomes by self-training algorithm.

Lomsadze A, Ter-Hovhannisyan V, Chernoff Y, Borodovsky M Nucleic Acids Res. 2005; 33(20):6494-506.

PMID: 16314312 PMC: 1298918. DOI: 10.1093/nar/gki937.


Candidate-gene screening and association analysis at the autism-susceptibility locus on chromosome 16p: evidence of association at GRIN2A and ABAT.

Barnby G, Abbott A, Sykes N, Morris A, Weeks D, Mott R Am J Hum Genet. 2005; 76(6):950-66.

PMID: 15830322 PMC: 1196454. DOI: 10.1086/430454.