» Articles » PMID: 1763041

Locating Protein-coding Regions in Human DNA Sequences by a Multiple Sensor-neural Network Approach

Overview
Specialty Science
Date 1991 Dec 15
PMID 1763041
Citations 95
Authors
Affiliations
Soon will be listed here.
Abstract

Genes in higher eukaryotes may span tens or hundreds of kilobases with the protein-coding regions accounting for only a few percent of the total sequence. Identifying genes within large regions of uncharacterized DNA is a difficult undertaking and is currently the focus of many research efforts. We describe a reliable computational approach for locating protein-coding portions of genes in anonymous DNA sequence. Using a concept suggested by robotic environmental sensing, our method combines a set of sensor algorithms and a neural network to localize the coding regions. Several algorithms that report local characteristics of the DNA sequence, and therefore act as sensors, are also described. In its current configuration the "coding recognition module" identifies 90% of coding exons of length 100 bases or greater with less than one false positive coding exon indicated per five coding exons indicated. This is a significantly lower false positive rate than any method of which we are aware. This module demonstrates a method with general applicability to sequence-pattern recognition problems and is available for current research efforts.

Citing Articles

gene prediction for protein-coding regions.

Baker L, David C, Jacobs D Bioinform Adv. 2023; 3(1):vbad105.

PMID: 37638212 PMC: 10448985. DOI: 10.1093/bioadv/vbad105.


Exploring Yeast as a Study Model of Pantothenate Kinase-Associated Neurodegeneration and for the Identification of Therapeutic Compounds.

Ceccatelli Berti C, Gilea A, De Gregorio M, Goffrini P Int J Mol Sci. 2021; 22(1).

PMID: 33396642 PMC: 7795310. DOI: 10.3390/ijms22010293.


Modeling human Coenzyme A synthase mutation in yeast reveals altered mitochondrial function, lipid content and iron metabolism.

Ceccatelli Berti C, Dallabona C, Lazzaretti M, Dusi S, Tosi E, Tiranti V Microb Cell. 2017; 2(4):126-135.

PMID: 28357284 PMC: 5348974. DOI: 10.15698/mic2015.04.196.


IN-MACA-MCC: Integrated Multiple Attractor Cellular Automata with Modified Clonal Classifier for Human Protein Coding and Promoter Prediction.

Pokkuluri K, Inampudi R, Nedunuri S Adv Bioinformatics. 2014; 2014:261362.

PMID: 25132849 PMC: 4123571. DOI: 10.1155/2014/261362.


Biologically inspired intelligent decision making: a commentary on the use of artificial neural networks in bioinformatics.

Manning T, Sleator R, Walsh P Bioengineered. 2013; 5(2):80-95.

PMID: 24335433 PMC: 4049912. DOI: 10.4161/bioe.26997.


References
1.
Bilofsky H, BURKS C . The GenBank genetic sequence data bank. Nucleic Acids Res. 1988; 16(5):1861-3. PMC: 338181. DOI: 10.1093/nar/16.5.1861. View

2.
Olson M, Hood L, Cantor C, Botstein D . A common language for physical mapping of the human genome. Science. 1989; 245(4925):1434-5. DOI: 10.1126/science.2781285. View

3.
McLachlan A, Staden R, Boswell D . A method for measuring the non-random bias of a codon usage table. Nucleic Acids Res. 1984; 12(24):9567-75. PMC: 320481. DOI: 10.1093/nar/12.24.9567. View

4.
Devereux J, Haeberli P, SMITHIES O . A comprehensive set of sequence analysis programs for the VAX. Nucleic Acids Res. 1984; 12(1 Pt 1):387-95. PMC: 321012. DOI: 10.1093/nar/12.1part1.387. View

5.
Hopfield J . Neural networks and physical systems with emergent collective computational abilities. Proc Natl Acad Sci U S A. 1982; 79(8):2554-8. PMC: 346238. DOI: 10.1073/pnas.79.8.2554. View