» Articles » PMID: 15849315

Computational Technique for Improvement of the Position-weight Matrices for the DNA/protein Binding Sites

Overview
Specialty Biochemistry
Date 2005 Apr 26
PMID 15849315
Citations 34
Authors
Affiliations
Soon will be listed here.
Abstract

Position-weight matrices (PWMs) are broadly used to locate transcription factor binding sites in DNA sequences. The majority of existing PWMs provide a low level of both sensitivity and specificity. We present a new computational algorithm, a modification of the Staden-Bucher approach, that improves the PWM. We applied the proposed technique on the PWM of the GC-box, binding site for Sp1. The comparison of old and new PWMs shows that the latter increase both sensitivity and specificity. The statistical parameters of GC-box distribution in promoter regions and in the human genome, as well as in each chromosome, are presented. The majority of commonly used PWMs are the 4-row mononucleotide matrices, although 16-row dinucleotide matrices are known to be more informative. The algorithm efficiently determines the 16-row matrices and preliminary results show that such matrices provide better results than 4-row matrices.

Citing Articles

The evaluation of transcription factor binding site prediction tools in human and Arabidopsis genomes.

Wanniarachchi D, Viswakula S, Wickramasuriya A BMC Bioinformatics. 2024; 25(1):371.

PMID: 39623329 PMC: 11613939. DOI: 10.1186/s12859-024-05995-0.


Investigating the sequence landscape in the initiator core promoter element using an enhanced MARZ algorithm.

Dresch J, Conrad R, Klonaros D, Drewell R PeerJ. 2023; 11:e15597.

PMID: 37366427 PMC: 10290830. DOI: 10.7717/peerj.15597.


Bayesian Markov models improve the prediction of binding motifs beyond first order.

Ge W, Meier M, Roth C, Soding J NAR Genom Bioinform. 2021; 3(2):lqab026.

PMID: 33928244 PMC: 8057495. DOI: 10.1093/nargab/lqab026.


Contribution of nonconsensus base pairs within ArsR binding sequences toward ArsR-DNA binding and arsenic-mediated transcriptional induction.

Chen X, Jiang X, Tie C, Yoo J, Wang Y, Xu M J Biol Eng. 2019; 13:53.

PMID: 31182975 PMC: 6555750. DOI: 10.1186/s13036-019-0181-4.


Predicting conformational ensembles and genome-wide transcription factor binding sites from DNA sequences.

Andrabi M, Hutchins A, Miranda-Saavedra D, Kono H, Nussinov R, Mizuguchi K Sci Rep. 2017; 7(1):4071.

PMID: 28642456 PMC: 5481346. DOI: 10.1038/s41598-017-03199-6.


References
1.
Stormo G . DNA binding sites: representation and discovery. Bioinformatics. 2000; 16(1):16-23. DOI: 10.1093/bioinformatics/16.1.16. View

2.
Stormo G, Hartzell 3rd G . Identifying protein-binding sites from unaligned DNA fragments. Proc Natl Acad Sci U S A. 1989; 86(4):1183-7. PMC: 286650. DOI: 10.1073/pnas.86.4.1183. View

3.
Man T, Stormo G . Non-independence of Mnt repressor-operator interaction determined by a new quantitative multiple fluorescence relative affinity (QuMFRA) assay. Nucleic Acids Res. 2001; 29(12):2471-8. PMC: 55749. DOI: 10.1093/nar/29.12.2471. View

4.
Praz V, Perier R, Bonnard C, Bucher P . The Eukaryotic Promoter Database, EPD: new entry types and links to gene expression data. Nucleic Acids Res. 2001; 30(1):322-4. PMC: 99099. DOI: 10.1093/nar/30.1.322. View

5.
Suzuki Y, Yamashita R, Nakai K, Sugano S . DBTSS: DataBase of human Transcriptional Start Sites and full-length cDNAs. Nucleic Acids Res. 2001; 30(1):328-31. PMC: 99097. DOI: 10.1093/nar/30.1.328. View