» Articles » PMID: 10582576

A Dictionary-based Approach for Gene Annotation

Overview
Journal J Comput Biol
Date 1999 Dec 3
PMID 10582576
Citations 6
Authors
Affiliations
Soon will be listed here.
Abstract

This paper describes a fast and fully automated dictionary-based approach to gene annotation and exon prediction. Two dictionaries are constructed, one from the nonredundant protein OWL database and the other from the dbEST database. These dictionaries are used to obtain O (1) time lookups of tuples in the dictionaries (4 tuples for the OWL database and 11 tuples for the dbEST database). These tuples can be used to rapidly find the longest matches at every position in an input sequence to the database sequences. Such matches provide very useful information pertaining to locating common segments between exons, alternative splice sites, and frequency data of long tuples for statistical purposes. These dictionaries also provide the basis for both homology determination, and statistical approaches to exon prediction.

Citing Articles

Levenshtein Distance, Sequence Comparison and Biological Database Search.

Berger B, Waterman M, Yu Y IEEE Trans Inf Theory. 2021; 67(6):3287-3294.

PMID: 34257466 PMC: 8274556. DOI: 10.1109/tit.2020.2996543.


Improving the specificity of exon prediction using comparative genomics.

Wu J BMC Genomics. 2008; 9 Suppl 2:S13.

PMID: 18831778 PMC: 2559877. DOI: 10.1186/1471-2164-9-S2-S13.


Gene identification in novel eukaryotic genomes by self-training algorithm.

Lomsadze A, Ter-Hovhannisyan V, Chernoff Y, Borodovsky M Nucleic Acids Res. 2005; 33(20):6494-506.

PMID: 16314312 PMC: 1298918. DOI: 10.1093/nar/gki937.


A complexity reduction algorithm for analysis and annotation of large genomic sequences.

Chuang T, Lin W, Lee H, Wang C, Hsiao K, Wang Z Genome Res. 2003; 13(2):313-22.

PMID: 12566410 PMC: 420370. DOI: 10.1101/gr.313703.


Current methods of gene prediction, their strengths and weaknesses.

Mathe C, Sagot M, Schiex T, Rouze P Nucleic Acids Res. 2002; 30(19):4103-17.

PMID: 12364589 PMC: 140543. DOI: 10.1093/nar/gkf543.