» Articles » PMID: 25589474

QPMS9: an Efficient Algorithm for Quorum Planted Motif Search

Overview
Journal Sci Rep
Specialty Science
Date 2015 Jan 16
PMID 25589474
Citations 7
Authors
Affiliations
Soon will be listed here.
Abstract

Discovering patterns in biological sequences is a crucial problem. For example, the identification of patterns in DNA sequences has resulted in the determination of open reading frames, identification of gene promoter elements, intron/exon splicing sites, and SH RNAs, location of RNA degradation signals, identification of alternative splicing sites, etc. In protein sequences, patterns have led to domain identification, location of protease cleavage sites, identification of signal peptides, protein interactions, determination of protein degradation elements, identification of protein trafficking elements, discovery of short functional motifs, etc. In this paper we focus on the identification of an important class of patterns, namely, motifs. We study the (ℓ, d) motif search problem or Planted Motif Search (PMS). PMS receives as input n strings and two integers ℓ and d. It returns all sequences M of length ℓ that occur in each input string, where each occurrence differs from M in at most d positions. Another formulation is quorum PMS (qPMS), where the motif appears in at least q% of the strings. We introduce qPMS9, a parallel exact qPMS algorithm that offers significant runtime improvements on DNA and protein datasets. qPMS9 solves the challenging DNA (ℓ, d)-instances (28, 12) and (30, 13). The source code is available at https://code.google.com/p/qpms9/.

Citing Articles

A Review on Planted (, d) Motif Discovery Algorithms for Medical Diagnose.

Mohanty S, Pattnaik P, Al-Absi A, Kang D Sensors (Basel). 2022; 22(3).

PMID: 35161949 PMC: 8838483. DOI: 10.3390/s22031204.


Novel algorithms for LDD motif search.

Xiao P, Schiller M, Rajasekaran S BMC Genomics. 2019; 20(Suppl 5):424.

PMID: 31167665 PMC: 6551235. DOI: 10.1186/s12864-019-5701-6.


SamSelect: a sample sequence selection algorithm for quorum planted motif search on large DNA datasets.

Yu Q, Wei D, Huo H BMC Bioinformatics. 2018; 19(1):228.

PMID: 29914360 PMC: 6006848. DOI: 10.1186/s12859-018-2242-y.


Parallel implementation of D-Phylo algorithm for maximum likelihood clusters.

Malik S, Sharma D, Khatri S IET Nanobiotechnol. 2017; 11(2):134-142.

PMID: 28476995 PMC: 8676462. DOI: 10.1049/iet-nbt.2016.0005.


PairMotifChIP: A Fast Algorithm for Discovery of Patterns Conserved in Large ChIP-seq Data Sets.

Yu Q, Huo H, Feng D Biomed Res Int. 2016; 2016:4986707.

PMID: 27843946 PMC: 5098105. DOI: 10.1155/2016/4986707.


References
1.
Dinh H, Rajasekaran S, Kundeti V . PMS5: an efficient exact algorithm for the (ℓ, d)-motif finding problem. BMC Bioinformatics. 2011; 12:410. PMC: 3269969. DOI: 10.1186/1471-2105-12-410. View

2.
Tanaka S . Improved Exact Enumerative Algorithms for the Planted (l, d)-Motif Search Problem. IEEE/ACM Trans Comput Biol Bioinform. 2015; 11(2):361-74. DOI: 10.1109/TCBB.2014.2306842. View

3.
Rajasekaran S, Balla S, Huang C . Exact algorithms for planted motif problems. J Comput Biol. 2005; 12(8):1117-28. DOI: 10.1089/cmb.2005.12.1117. View

4.
Yu Q, Huo H, Zhang Y, Guo H . PairMotif: A new pattern-driven algorithm for planted (l, d) DNA motif search. PLoS One. 2012; 7(10):e48442. PMC: 3485246. DOI: 10.1371/journal.pone.0048442. View

5.
Davila J, Balla S, Rajasekaran S . Fast and practical algorithms for planted (l, d) motif search. IEEE/ACM Trans Comput Biol Bioinform. 2007; 4(4):544-52. DOI: 10.1109/TCBB.2007.70241. View