» Articles » PMID: 2014171

Training Back-propagation Neural Networks to Define and Detect DNA-binding Sites

Overview
Specialty Biochemistry
Date 1991 Jan 25
PMID 2014171
Citations 22
Authors
Affiliations
Soon will be listed here.
Abstract

A three layered back-propagation neural network was trained to recognize E. coli promoters of the 17 base spacing class. To this end, the network was presented with 39 promoter sequences and derivatives of those sequences as positive inputs; 60% A + T random sequences and sequences containing 2 promoter-down point mutations were used as negative inputs. The entire promoter sequence of 58 bases, approximately -50 to +8, was entered as input. The network was asked to associate an output of 1.0 with promoter sequence input and 0.0 with non-promoter input. Generally, after 100,000 input cycles, the network was virtually perfect in classifying the training set. A trained network was about 80% effective in recognizing 'new' promoters which were not in the training set, with a false positive rate below 0.1%. Network searches on pBR322 and on the lambda genome were also performed. Overall the results were somewhat better than the best rule-based procedures. The trained network can be analyzed both for its choice of base and relative weighting, positive and negative, at each position of the sequence. This method, which requires only appropriate input/output training pairs, can be used to define and search for any DNA regulatory sequence for which there are sufficient exemplars.

Citing Articles

Identifying functional transcription factor binding sites in yeast by considering their positional preference in the promoters.

Lai F, Chiu C, Yang T, Huang Y, Wu W PLoS One. 2014; 8(12):e83791.

PMID: 24386279 PMC: 3873331. DOI: 10.1371/journal.pone.0083791.


Rules extraction from neural networks applied to the prediction and recognition of prokaryotic promoters.

de Avila E Silva S, Gerhardt G, Echeverrigaray S Genet Mol Biol. 2011; 34(2):353-60.

PMID: 21734842 PMC: 3115335. DOI: 10.1590/s1415-47572011000200031.


A reexamination of information theory-based methods for DNA-binding site identification.

Erill I, ONeill M BMC Bioinformatics. 2009; 10:57.

PMID: 19210776 PMC: 2680408. DOI: 10.1186/1471-2105-10-57.


Computational gene finding in plants.

Pertea M, Salzberg S Plant Mol Biol. 2002; 48(1-2):39-48.

PMID: 11860211


A general procedure for locating and analyzing protein-binding sequence motifs in nucleic acids.

ONeill M Proc Natl Acad Sci U S A. 1998; 95(18):10710-5.

PMID: 9724769 PMC: 27960. DOI: 10.1073/pnas.95.18.10710.


References
1.
Youderian P, Bouvier S, Susskind M . Sequence determinants of promoter activity. Cell. 1982; 30(3):843-53. DOI: 10.1016/0092-8674(82)90289-6. View

2.
Stormo G, Schneider T, Gold L, Ehrenfeucht A . Use of the 'Perceptron' algorithm to distinguish translational initiation sites in E. coli. Nucleic Acids Res. 1982; 10(9):2997-3011. PMC: 320670. DOI: 10.1093/nar/10.9.2997. View

3.
Harr R, Haggstrom M, Gustafsson P . Search algorithm for pattern match analysis of nucleic acid sequences. Nucleic Acids Res. 1983; 11(9):2943-57. PMC: 325935. DOI: 10.1093/nar/11.9.2943. View

4.
Peden K . Revised sequence of the tetracycline-resistance gene of pBR322. Gene. 1983; 22(2-3):277-80. DOI: 10.1016/0378-1119(83)90112-9. View

5.
Staden R . Computer methods to locate signals in nucleic acid sequences. Nucleic Acids Res. 1984; 12(1 Pt 2):505-19. PMC: 321067. DOI: 10.1093/nar/12.1part2.505. View