» Articles » PMID: 24369152

Discriminative Motif Optimization Based on Perceptron Training

Overview
Journal Bioinformatics
Specialty Biology
Date 2013 Dec 27
PMID 24369152
Citations 10
Authors
Affiliations
Soon will be listed here.
Abstract

Motivation: Generating accurate transcription factor (TF) binding site motifs from data generated using the next-generation sequencing, especially ChIP-seq, is challenging. The challenge arises because a typical experiment reports a large number of sequences bound by a TF, and the length of each sequence is relatively long. Most traditional motif finders are slow in handling such enormous amount of data. To overcome this limitation, tools have been developed that compromise accuracy with speed by using heuristic discrete search strategies or limited optimization of identified seed motifs. However, such strategies may not fully use the information in input sequences to generate motifs. Such motifs often form good seeds and can be further improved with appropriate scoring functions and rapid optimization.

Results: We report a tool named discriminative motif optimizer (DiMO). DiMO takes a seed motif along with a positive and a negative database and improves the motif based on a discriminative strategy. We use area under receiver-operating characteristic curve (AUC) as a measure of discriminating power of motifs and a strategy based on perceptron training that maximizes AUC rapidly in a discriminative manner. Using DiMO, on a large test set of 87 TFs from human, drosophila and yeast, we show that it is possible to significantly improve motifs identified by nine motif finders. The motifs are generated/optimized using training sets and evaluated on test sets. The AUC is improved for almost 90% of the TFs on test sets and the magnitude of increase is up to 39%.

Availability And Implementation: DiMO is available at http://stormo.wustl.edu/DiMO

Citing Articles

Sharing DNA-binding information across structurally similar proteins enables accurate specificity determination.

Wetzel J, Singh M Nucleic Acids Res. 2019; 48(2):e9.

PMID: 31777934 PMC: 7028011. DOI: 10.1093/nar/gkz1087.


A map of direct TF-DNA interactions in the human genome.

Gheorghe M, Sandve G, Khan A, Cheneby J, Ballester B, Mathelier A Nucleic Acids Res. 2018; 47(4):e21.

PMID: 30517703 PMC: 6393237. DOI: 10.1093/nar/gky1210.


A novel -mer set memory (KSM) motif representation improves regulatory variant prediction.

Guo Y, Tian K, Zeng H, Guo X, Gifford D Genome Res. 2018; 28(6):891-900.

PMID: 29654070 PMC: 5991515. DOI: 10.1101/gr.226852.117.


Comparison of discriminative motif optimization using matrix and DNA shape-based models.

Ruan S, Stormo G BMC Bioinformatics. 2018; 19(1):86.

PMID: 29510689 PMC: 5840810. DOI: 10.1186/s12859-018-2104-7.


Direct AUC optimization of regulatory motifs.

Zhu L, Zhang H, Huang D Bioinformatics. 2017; 33(14):i243-i251.

PMID: 28881989 PMC: 5870558. DOI: 10.1093/bioinformatics/btx255.


References
1.
Huggins P, Zhong S, Shiff I, Beckerman R, Laptenko O, Prives C . DECOD: fast and accurate discriminative DNA motif finding. Bioinformatics. 2011; 27(17):2361-7. PMC: 3157928. DOI: 10.1093/bioinformatics/btr412. View

2.
Li L, Liang Y, Bass R . GAPWM: a genetic algorithm method for optimizing a position weight matrix. Bioinformatics. 2007; 23(10):1188-94. DOI: 10.1093/bioinformatics/btm080. View

3.
Davis I, Benninger C, Benfey P, Elich T . POWRS: position-sensitive motif discovery. PLoS One. 2012; 7(7):e40373. PMC: 3390389. DOI: 10.1371/journal.pone.0040373. View

4.
Grzybowski M, Younger J . Statistical methodology: III. Receiver operating characteristic (ROC) curves. Acad Emerg Med. 1997; 4(8):818-26. DOI: 10.1111/j.1553-2712.1997.tb03793.x. View

5.
Bewick V, Cheek L, Ball J . Statistics review 13: receiver operating characteristic curves. Crit Care. 2004; 8(6):508-12. PMC: 1065080. DOI: 10.1186/cc3000. View