» Articles » PMID: 18834544

Pol II Promoter Prediction Using Characteristic 4-mer Motifs: a Machine Learning Approach

Overview
Publisher Biomed Central
Specialty Biology
Date 2008 Oct 7
PMID 18834544
Citations 14
Authors
Affiliations
Soon will be listed here.
Abstract

Background: Eukaryotic promoter prediction using computational analysis techniques is one of the most difficult jobs in computational genomics that is essential for constructing and understanding genetic regulatory networks. The increased availability of sequence data for various eukaryotic organisms in recent years has necessitated for better tools and techniques for the prediction and analysis of promoters in eukaryotic sequences. Many promoter prediction methods and tools have been developed to date but they have yet to provide acceptable predictive performance. One obvious criteria to improve on current methods is to devise a better system for selecting appropriate features of promoters that distinguish them from non-promoters. Secondly improved performance can be achieved by enhancing the predictive ability of the machine learning algorithms used.

Results: In this paper, a novel approach is presented in which 128 4-mer motifs in conjunction with a non-linear machine-learning algorithm utilising a Support Vector Machine (SVM) are used to distinguish between promoter and non-promoter DNA sequences. By applying this approach to plant, Drosophila, human, mouse and rat sequences, the classification model has showed 7-fold cross-validation percentage accuracies of 83.81%, 94.82%, 91.25%, 90.77% and 82.35% respectively. The high sensitivity and specificity value of 0.86 and 0.90 for plant; 0.96 and 0.92 for Drosophila; 0.88 and 0.92 for human; 0.78 and 0.84 for mouse and 0.82 and 0.80 for rat demonstrate that this technique is less prone to false positive results and exhibits better performance than many other tools. Moreover, this model successfully identifies location of promoter using TATA weight matrix.

Conclusion: The high sensitivity and specificity indicate that 4-mer frequencies in conjunction with supervised machine-learning methods can be beneficial in the identification of RNA pol II promoters comparative to other methods. This approach can be extended to identify promoters in sequences for other eukaryotic genomes.

Citing Articles

Biological and Molecular Components for Genetically Engineering Biosensors in Plants.

Liu Y, Yuan G, Hassan M, Abraham P, Mitchell J, Jacobson D Biodes Res. 2023; 2022:9863496.

PMID: 37850147 PMC: 10521658. DOI: 10.34133/2022/9863496.


Critical assessment of computational tools for prokaryotic and eukaryotic promoter prediction.

Zhang M, Jia C, Li F, Li C, Zhu Y, Akutsu T Brief Bioinform. 2022; 23(2).

PMID: 35021193 PMC: 8921625. DOI: 10.1093/bib/bbab551.


Comparison of machine learning and deep learning techniques in promoter prediction across diverse species.

Bhandari N, Khare S, Walambe R, Kotecha K PeerJ Comput Sci. 2021; 7:e365.

PMID: 33817015 PMC: 7959599. DOI: 10.7717/peerj-cs.365.


Deep learning suggests that gene expression is encoded in all parts of a co-evolving interacting gene regulatory structure.

Zrimec J, Borlin C, Buric F, Muhammad A, Chen R, Siewers V Nat Commun. 2020; 11(1):6141.

PMID: 33262328 PMC: 7708451. DOI: 10.1038/s41467-020-19921-4.


Identification of Regulatory SNPs Associated with Vicine and Convicine Content of Based on Genotyping by Sequencing Data Using Deep Learning.

Heinrich F, Wutke M, Das P, Kamp M, Gultas M, Link W Genes (Basel). 2020; 11(6).

PMID: 32516876 PMC: 7349281. DOI: 10.3390/genes11060614.


References
1.
Schmid C, Perier R, Praz V, Bucher P . EPD in its twentieth year: towards complete promoter coverage of selected model organisms. Nucleic Acids Res. 2005; 34(Database issue):D82-5. PMC: 1347508. DOI: 10.1093/nar/gkj146. View

2.
Bajic V, Seah S, Chong A, Zhang G, Koh J, Brusic V . Dragon Promoter Finder: recognition of vertebrate RNA polymerase II promoters. Bioinformatics. 2002; 18(1):198-9. DOI: 10.1093/bioinformatics/18.1.198. View

3.
Ohler U . Identification of core promoter modules in Drosophila and their application in accurate transcription start site prediction. Nucleic Acids Res. 2006; 34(20):5943-50. PMC: 1635271. DOI: 10.1093/nar/gkl608. View

4.
Gershenzon N, Ioshikhes I . Synergy of human Pol II core promoter elements revealed by statistical sequence analysis. Bioinformatics. 2004; 21(8):1295-300. DOI: 10.1093/bioinformatics/bti172. View

5.
Down T, Hubbard T . Computational detection and location of transcription start sites in mammalian genomic DNA. Genome Res. 2002; 12(3):458-61. PMC: 155284. DOI: 10.1101/gr.216102. View