A Bayesian Search for Transcriptional Motifs

Overview

Journal PLoS One

Specialties General Medicine
Science

Date 2010 Dec 3

PMID 21124986

Citations 4

Authors

Andrew K Miller

Cristin G Print

Poul M F Nielsen

Edmund J Crampin

Affiliations

Soon will be listed here.

Abstract

Identifying transcription factor (TF) binding sites (TFBSs) is an important step towards understanding transcriptional regulation. A common approach is to use gaplessly aligned, experimentally supported TFBSs for a particular TF, and algorithmically search for more occurrences of the same TFBSs. The largest publicly available databases of TF binding specificities contain models which are represented as position weight matrices (PWM). There are other methods using more sophisticated representations, but these have more limited databases, or aren't publicly available. Therefore, this paper focuses on methods that search using one PWM per TF. An algorithm, MATCHTM, for identifying TFBSs corresponding to a particular PWM is available, but is not based on a rigorous statistical model of TF binding, making it difficult to interpret or adjust the parameters and output of the algorithm. Furthermore, there is no public description of the algorithm sufficient to exactly reproduce it. Another algorithm, MAST, computes a p-value for the presence of a TFBS using true probabilities of finding each base at each offset from that position. We developed a statistical model, BaSeTraM, for the binding of TFs to TFBSs, taking into account random variation in the base present at each position within a TFBS. Treating the counts in the matrices and the sequences of sites as random variables, we combine this TFBS composition model with a background model to obtain a Bayesian classifier. We implemented our classifier in a package (SBaSeTraM). We tested SBaSeTraM against a MATCHTM implementation by searching all probes used in an experimental Saccharomyces cerevisiae TF binding dataset, and comparing our predictions to the data. We found no statistically significant differences in sensitivity between the algorithms (at fixed selectivity), indicating that SBaSeTraM's performance is at least comparable to the leading currently available algorithm. Our software is freely available at: http://wiki.github.com/A1kmm/sbasetram/building-the-tools.

Citing Articles

Review of Different Sequence Motif Finding Algorithms.

Hashim F, Mabrouk M, Al-Atabany W Avicenna J Med Biotechnol. 2019; 11(2):130-148.

PMID: 31057715 PMC: 6490410.

An Empirical Prior Improves Accuracy for Bayesian Estimation of Transcription Factor Binding Site Frequencies within Gene Promoters.

Ramsey S Bioinform Biol Insights. 2016; 9(Suppl 4):59-69.

PMID: 27812284 PMC: 5081247. DOI: 10.4137/BBI.S29330.

PairMotif+: a fast and effective algorithm for de novo motif discovery in DNA sequences.

Yu Q, Huo H, Zhang Y, Guo H, Guo H Int J Biol Sci. 2013; 9(4):412-24.

PMID: 23678291 PMC: 3654438. DOI: 10.7150/ijbs.5786.

PairMotif: A new pattern-driven algorithm for planted (l, d) DNA motif search.

Yu Q, Huo H, Zhang Y, Guo H PLoS One. 2012; 7(10):e48442.

PMID: 23119020 PMC: 3485246. DOI: 10.1371/journal.pone.0048442.

References

Lahdesmaki H, Rust A, Shmulevich I . Probabilistic inference of transcription factor binding from multiple data sources. PLoS One. 2008; 3(3):e1820. PMC: 2268002. DOI: 10.1371/journal.pone.0001820. View

Marinescu V, Kohane I, Riva A . MAPPER: a search engine for the computational identification of putative transcription factor binding sites in multiple genomes. BMC Bioinformatics. 2005; 6:79. PMC: 1131891. DOI: 10.1186/1471-2105-6-79. View

Bailey T, Williams N, Misleh C, Li W . MEME: discovering and analyzing DNA and protein sequence motifs. Nucleic Acids Res. 2006; 34(Web Server issue):W369-73. PMC: 1538909. DOI: 10.1093/nar/gkl198. View

Tompa M, Li N, Bailey T, Church G, De Moor B, Eskin E . Assessing computational tools for the discovery of transcription factor binding sites. Nat Biotechnol. 2005; 23(1):137-44. DOI: 10.1038/nbt1053. View

Das M, Dai H . A survey of DNA motif finding algorithms. BMC Bioinformatics. 2007; 8 Suppl 7:S21. PMC: 2099490. DOI: 10.1186/1471-2105-8-S7-S21. View

Matys V, Fricke E, Geffers R, Gossling E, Haubrock M, Hehl R . TRANSFAC: transcriptional regulation, from patterns to profiles. Nucleic Acids Res. 2003; 31(1):374-8. PMC: 165555. DOI: 10.1093/nar/gkg108. View

Liu L, Bader J . Ab initio prediction of transcription factor binding sites. Pac Symp Biocomput. 2007; :484-95. View

Mahony S, Auron P, Benos P . DNA familial binding profiles made easy: comparison of various motif alignment and clustering strategies. PLoS Comput Biol. 2007; 3(3):e61. PMC: 1848003. DOI: 10.1371/journal.pcbi.0030061. View

Kel A, Gossling E, Reuter I, Cheremushkin E, Kel-Margoulis O, Wingender E . MATCH: A tool for searching transcription factor binding sites in DNA sequences. Nucleic Acids Res. 2003; 31(13):3576-9. PMC: 169193. DOI: 10.1093/nar/gkg585. View

10.

Harbison C, Gordon D, Lee T, Rinaldi N, MacIsaac K, Danford T . Transcriptional regulatory code of a eukaryotic genome. Nature. 2004; 431(7004):99-104. PMC: 3006441. DOI: 10.1038/nature02800. View

11.

Bailey T, Gribskov M . Combining evidence using p-values: application to sequence homology searches. Bioinformatics. 1998; 14(1):48-54. DOI: 10.1093/bioinformatics/14.1.48. View

12.

Sandelin A, Alkema W, Engstrom P, Wasserman W, Lenhard B . JASPAR: an open-access database for eukaryotic transcription factor binding profiles. Nucleic Acids Res. 2003; 32(Database issue):D91-4. PMC: 308747. DOI: 10.1093/nar/gkh012. View