» Articles » PMID: 27609510

A Novel Method for Predicting Activity of Cis-regulatory Modules, Based on a Diverse Training Set

Overview
Journal Bioinformatics
Specialty Biology
Date 2016 Sep 10
PMID 27609510
Citations 2
Authors
Affiliations
Soon will be listed here.
Abstract

Motivation: With the rapid emergence of technologies for locating cis-regulatory modules (CRMs) genome-wide, the next pressing challenge is to assign precise functions to each CRM, i.e. to determine the spatiotemporal domains or cell-types where it drives expression. A popular approach to this task is to model the typical k-mer composition of a set of CRMs known to drive a common expression pattern, and assign that pattern to other CRMs exhibiting a similar k-mer composition. This approach does not rely on prior knowledge of transcription factors relevant to the CRM or their binding motifs, and is thus more widely applicable than motif-based methods for predicting CRM activity, but is also prone to false positive predictions.

Results: We present a novel strategy to improve the above-mentioned approach: to predict if a CRM drives a specific gene expression pattern, assess not only how similar the CRM is to other CRMs with similar activity but also to CRMs with distinct activities. We use a state-of-the-art statistical method to quantify a CRM's sequence similarity to many different training sets of CRMs, and employ a classification algorithm to integrate these similarity scores into a single prediction of the CRM's activity. This strategy is shown to significantly improve CRM activity prediction over current approaches.

Availability And Implementation: Our implementation of the new method, called IMMBoost, is freely available as source code, at https://github.com/weiyangedward/IMMBoost CONTACT: sinhas@illinois.eduSupplementary information: Supplementary data are available at Bioinformatics online.

Citing Articles

Identification of gene specific cis-regulatory elements during differentiation of mouse embryonic stem cells: An integrative approach using high-throughput datasets.

Vijayabaskar M, Goode D, Obier N, Lichtinger M, Emmett A, Zainul Abidin F PLoS Comput Biol. 2019; 15(11):e1007337.

PMID: 31682597 PMC: 6855567. DOI: 10.1371/journal.pcbi.1007337.


CRM Discovery Beyond Model Insects.

Kazemian M, Halfon M Methods Mol Biol. 2018; 1858:117-139.

PMID: 30414115 PMC: 6482005. DOI: 10.1007/978-1-4939-8775-7_10.

References
1.
Narlikar L, Sakabe N, Blanski A, Arimura F, Westlund J, Nobrega M . Genome-wide discovery of human heart enhancers. Genome Res. 2010; 20(3):381-92. PMC: 2840982. DOI: 10.1101/gr.098657.109. View

2.
Ahmad S, Busser B, Huang D, Cozart E, Michaud S, Zhu X . Machine learning classification of cell-specific cardiac enhancers uncovers developmental subnetworks regulating progenitor cell division and cell fate specification. Development. 2014; 141(4):878-88. PMC: 3912831. DOI: 10.1242/dev.101709. View

3.
Giresi P, Kim J, McDaniell R, Iyer V, Lieb J . FAIRE (Formaldehyde-Assisted Isolation of Regulatory Elements) isolates active regulatory elements from human chromatin. Genome Res. 2006; 17(6):877-85. PMC: 1891346. DOI: 10.1101/gr.5533506. View

4.
Erwin G, Oksenberg N, Truty R, Kostka D, Murphy K, Ahituv N . Integrating diverse datasets improves developmental enhancer prediction. PLoS Comput Biol. 2014; 10(6):e1003677. PMC: 4072507. DOI: 10.1371/journal.pcbi.1003677. View

5.
Bernstein B, Stamatoyannopoulos J, Costello J, Ren B, Milosavljevic A, Meissner A . The NIH Roadmap Epigenomics Mapping Consortium. Nat Biotechnol. 2010; 28(10):1045-8. PMC: 3607281. DOI: 10.1038/nbt1010-1045. View