» Articles » PMID: 21172058

RecMotif: a Novel Fast Algorithm for Weak Motif Discovery

Overview
Publisher Biomed Central
Specialty Biology
Date 2010 Dec 22
PMID 21172058
Citations 8
Authors
Affiliations
Soon will be listed here.
Abstract

Background: Weak motif discovery in DNA sequences is an important but unresolved problem in computational biology. Previous algorithms that aimed to solve the problem usually require a large amount of memory or execution time. In this paper, we proposed a fast and memory efficient algorithm, RecMotif, which guarantees to discover all motifs with specific (l, d) settings (where l is the motif length and d is the maximum number of mutations between a motif instance and the true motif).

Results: Comparisons with several recently proposed algorithms have shown that RecMotif is more scalable for handling longer and weaker motifs. For instance, it can solve the open challenge cases such as (40, 14) within 5 hours while the other algorithms compared failed due to either longer execution times or shortage of memory space. For real biological sequences, such as E.coli CRP, RecMotif is able to accurately discover the motif instances with (l, d) as (18, 6) in less than 1 second, which is faster than the other algorithms compared.

Conclusions: RecMotif is a novel algorithm that requires only a space complexity of O(m²n) (where m is the number of sequences in the data and n is the length of the sequences).

Citing Articles

Review of Different Sequence Motif Finding Algorithms.

Hashim F, Mabrouk M, Al-Atabany W Avicenna J Med Biotechnol. 2019; 11(2):130-148.

PMID: 31057715 PMC: 6490410.


SamSelect: a sample sequence selection algorithm for quorum planted motif search on large DNA datasets.

Yu Q, Wei D, Huo H BMC Bioinformatics. 2018; 19(1):228.

PMID: 29914360 PMC: 6006848. DOI: 10.1186/s12859-018-2242-y.


RefSelect: a reference sequence selection algorithm for planted (l, d) motif search.

Yu Q, Huo H, Zhao R, Feng D, Vitter J, Huan J BMC Bioinformatics. 2016; 17 Suppl 9:266.

PMID: 27454113 PMC: 4959363. DOI: 10.1186/s12859-016-1130-6.


Extracting DNA words based on the sequence features: non-uniform distribution and integrity.

Li Z, Cao H, Cui Y, Zhang Y Theor Biol Med Model. 2016; 13:2.

PMID: 26811154 PMC: 4727310. DOI: 10.1186/s12976-016-0028-3.


A new exhaustive method and strategy for finding motifs in ChIP-enriched regions.

Jia C, Carson M, Wang Y, Lin Y, Lu H PLoS One. 2014; 9(1):e86044.

PMID: 24475069 PMC: 3901781. DOI: 10.1371/journal.pone.0086044.


References
1.
Wang G, Yu T, Zhang W . WordSpy: identifying transcription factor binding motifs by building a dictionary and learning a grammar. Nucleic Acids Res. 2005; 33(Web Server issue):W412-6. PMC: 1160252. DOI: 10.1093/nar/gki492. View

2.
Buhler J, Tompa M . Finding motifs using random projections. J Comput Biol. 2002; 9(2):225-42. DOI: 10.1089/10665270252935430. View

3.
Stormo G, Hartzell 3rd G . Identifying protein-binding sites from unaligned DNA fragments. Proc Natl Acad Sci U S A. 1989; 86(4):1183-7. PMC: 286650. DOI: 10.1073/pnas.86.4.1183. View

4.
Bailey T, Elkan C . Fitting a mixture model by expectation maximization to discover motifs in biopolymers. Proc Int Conf Intell Syst Mol Biol. 1994; 2:28-36. View

5.
Wijaya E, Yiu S, Son N, Kanagasabai R, Sung W . MotifVoter: a novel ensemble method for fine-grained integration of generic motif finders. Bioinformatics. 2008; 24(20):2288-95. DOI: 10.1093/bioinformatics/btn420. View