» Articles » PMID: 20439311

Inclusion of Neighboring Base Interdependencies Substantially Improves Genome-wide Prokaryotic Transcription Factor Binding Site Prediction

Overview
Specialty Biochemistry
Date 2010 May 5
PMID 20439311
Citations 14
Authors
Affiliations
Soon will be listed here.
Abstract

Prediction of transcription factor binding sites is an important challenge in genome analysis. The advent of next generation genome sequencing technologies makes the development of effective computational approaches particularly imperative. We have developed a novel training-based methodology intended for prokaryotic transcription factor binding site prediction. Our methodology extends existing models by taking into account base interdependencies between neighbouring positions using conditional probabilities and includes genomic background weighting. This has been tested against other existing and novel methodologies including position-specific weight matrices, first-order Hidden Markov Models and joint probability models. We have also tested the use of gapped and ungapped alignments and the inclusion or exclusion of background weighting. We show that our best method enhances binding site prediction for all of the 22 Escherichia coli transcription factors with at least 20 known binding sites, with many showing substantial improvements. We highlight the advantage of using block alignments of binding sites over gapped alignments to capture neighbouring position interdependencies. We also show that combining these methods with ChIP-on-chip data has the potential to further improve binding site prediction. Finally we have developed the ungapped likelihood under positional background platform: a user friendly website that gives access to the prediction method devised in this work.

Citing Articles

BML: a versatile web server for bipartite motif discovery.

Vahed M, Vahed M, Garmire L Brief Bioinform. 2022; 23(1).

PMID: 34974623 PMC: 8769915. DOI: 10.1093/bib/bbab536.


DIpartite: A tool for detecting bipartite motifs by considering base interdependencies.

Vahed M, Ishihara J, Takahashi H PLoS One. 2019; 14(8):e0220207.

PMID: 31469855 PMC: 6716629. DOI: 10.1371/journal.pone.0220207.


Combining phylogenetic footprinting with motif models incorporating intra-motif dependencies.

Nettling M, Treutler H, Cerquides J, Grosse I BMC Bioinformatics. 2017; 18(1):141.

PMID: 28249564 PMC: 5333389. DOI: 10.1186/s12859-017-1495-1.


Parametric bootstrapping for biological sequence motifs.

ONeill P, Erill I BMC Bioinformatics. 2016; 17(1):406.

PMID: 27716039 PMC: 5052923. DOI: 10.1186/s12859-016-1246-8.


Knowledge-based three-body potential for transcription factor binding site prediction.

Qin W, Zhao G, Carson M, Jia C, Lu H IET Syst Biol. 2016; 10(1):23-9.

PMID: 26816396 PMC: 8687219. DOI: 10.1049/iet-syb.2014.0066.


References
1.
Hertz G, Stormo G . Identifying DNA and protein patterns with statistically significant alignments of multiple sequences. Bioinformatics. 1999; 15(7-8):563-77. DOI: 10.1093/bioinformatics/15.7.563. View

2.
Osada R, Zaslavsky E, Singh M . Comparative analysis of methods for representing and searching for transcription factor binding sites. Bioinformatics. 2004; 20(18):3516-25. DOI: 10.1093/bioinformatics/bth438. View

3.
Grainger D, Hurd D, Harrison M, Holdstock J, Busby S . Studies of the distribution of Escherichia coli cAMP-receptor protein and RNA polymerase along the E. coli chromosome. Proc Natl Acad Sci U S A. 2005; 102(49):17693-8. PMC: 1308901. DOI: 10.1073/pnas.0506687102. View

4.
Stormo G, Schneider T, Gold L . Quantitative analysis of the relationship between nucleotide sequence and functional activity. Nucleic Acids Res. 1986; 14(16):6661-79. PMC: 311672. DOI: 10.1093/nar/14.16.6661. View

5.
Aparicio O, Geisberg J, Sekinger E, Yang A, Moqtaderi Z, Struhl K . Chromatin immunoprecipitation for determining the association of proteins with specific genomic sequences in vivo. Curr Protoc Mol Biol. 2008; Chapter 21:Unit 21.3. DOI: 10.1002/0471142727.mb2103s69. View