» Articles » PMID: 19223448

M Are Better Than One: an Ensemble-based Motif Finder and Its Application to Regulatory Element Prediction

Overview
Journal Bioinformatics
Specialty Biology
Date 2009 Feb 19
PMID 19223448
Citations 5
Authors
Affiliations
Soon will be listed here.
Abstract

Motivation: Identifying regulatory elements in genomic sequences is a key component in understanding the control of gene expression. Computationally, this problem is often addressed by motif discovery, where the goal is to find a set of mutually similar subsequences within a collection of input sequences. Though motif discovery is widely studied and many approaches to it have been suggested, it remains a challenging and as yet unresolved problem.

Results: We introduce SAMF (Solution-Aggregating Motif Finder), a novel approach for motif discovery. SAMF is based on a Markov Random Field formulation, and its key idea is to uncover and aggregate multiple statistically significant solutions to the given motif finding problem. In contrast to many earlier methods, SAMF does not require prior estimates on the number of motif instances present in the data, is not limited by motif length, and allows motifs to overlap. Though SAMF is broadly applicable, these features make it particularly well suited for addressing the challenges of prokaryotic regulatory element detection. We test SAMF's ability to find transcription factor binding sites in an Escherichia coli dataset and show that it outperforms previous methods. Additionally, we uncover a number of previously unidentified binding sites in this data, and provide evidence that they correspond to actual regulatory elements.

Supplementary Information: Supplementary data are available at Bioinformatics online.

Citing Articles

Identification of large disjoint motifs in biological networks.

Elhesha R, Kahveci T BMC Bioinformatics. 2016; 17(1):408.

PMID: 27716036 PMC: 5053092. DOI: 10.1186/s12859-016-1271-7.


LASAGNA: a novel algorithm for transcription factor binding site alignment.

Lee C, Huang C BMC Bioinformatics. 2013; 14:108.

PMID: 23522376 PMC: 3747862. DOI: 10.1186/1471-2105-14-108.


Searching for transcription factor binding sites in vector spaces.

Lee C, Huang C BMC Bioinformatics. 2012; 13:215.

PMID: 23244338 PMC: 3543194. DOI: 10.1186/1471-2105-13-215.


PROSPER: an integrated feature-based tool for predicting protease substrate cleavage sites.

Song J, Tan H, Perry A, Akutsu T, Webb G, Whisstock J PLoS One. 2012; 7(11):e50300.

PMID: 23209700 PMC: 3510211. DOI: 10.1371/journal.pone.0050300.


Mechanisms and evolution of control logic in prokaryotic transcriptional regulation.

van Hijum S, Medema M, Kuipers O Microbiol Mol Biol Rev. 2009; 73(3):481-509, Table of Contents.

PMID: 19721087 PMC: 2738135. DOI: 10.1128/MMBR.00037-08.

References
1.
Blanco A, Sola M, Gomis-Ruth F, Coll M . Tandem DNA recognition by PhoB, a two-component signal transduction transcriptional activator. Structure. 2002; 10(5):701-13. DOI: 10.1016/s0969-2126(02)00761-x. View

2.
Elemento O, Slonim N, Tavazoie S . A universal framework for regulatory element discovery across all genomes and data types. Mol Cell. 2007; 28(2):337-50. PMC: 2900317. DOI: 10.1016/j.molcel.2007.09.027. View

3.
MacIsaac K, Fraenkel E . Practical strategies for discovering regulatory DNA sequence motifs. PLoS Comput Biol. 2006; 2(4):e36. PMC: 1447654. DOI: 10.1371/journal.pcbi.0020036. View

4.
Osada R, Zaslavsky E, Singh M . Comparative analysis of methods for representing and searching for transcription factor binding sites. Bioinformatics. 2004; 20(18):3516-25. DOI: 10.1093/bioinformatics/bth438. View

5.
Tatusov R, Altschul S, Koonin E . Detection of conserved segments in proteins: iterative scanning of sequence databases with alignment blocks. Proc Natl Acad Sci U S A. 1994; 91(25):12091-5. PMC: 45382. DOI: 10.1073/pnas.91.25.12091. View