» Articles » PMID: 29069282

Chromatin Accessibility Prediction Via a Hybrid Deep Convolutional Neural Network

Overview
Journal Bioinformatics
Specialty Biology
Date 2017 Oct 26
PMID 29069282
Citations 28
Authors
Affiliations
Soon will be listed here.
Abstract

Motivation: A majority of known genetic variants associated with human-inherited diseases lie in non-coding regions that lack adequate interpretation, making it indispensable to systematically discover functional sites at the whole genome level and precisely decipher their implications in a comprehensive manner. Although computational approaches have been complementing high-throughput biological experiments towards the annotation of the human genome, it still remains a big challenge to accurately annotate regulatory elements in the context of a specific cell type via automatic learning of the DNA sequence code from large-scale sequencing data. Indeed, the development of an accurate and interpretable model to learn the DNA sequence signature and further enable the identification of causative genetic variants has become essential in both genomic and genetic studies.

Results: We proposed Deopen, a hybrid framework mainly based on a deep convolutional neural network, to automatically learn the regulatory code of DNA sequences and predict chromatin accessibility. In a series of comparison with existing methods, we show the superior performance of our model in not only the classification of accessible regions against background sequences sampled at random, but also the regression of DNase-seq signals. Besides, we further visualize the convolutional kernels and show the match of identified sequence signatures and known motifs. We finally demonstrate the sensitivity of our model in finding causative noncoding variants in the analysis of a breast cancer dataset. We expect to see wide applications of Deopen with either public or in-house chromatin accessibility data in the annotation of the human genome and the identification of non-coding variants associated with diseases.

Availability And Implementation: Deopen is freely available at https://github.com/kimmo1019/Deopen.

Contact: ruijiang@tsinghua.edu.cn.

Supplementary Information: Supplementary data are available at Bioinformatics online.

Citing Articles

EpiGePT: a pretrained transformer-based language model for context-specific human epigenomics.

Gao Z, Liu Q, Zeng W, Jiang R, Wong W Genome Biol. 2024; 25(1):310.

PMID: 39696471 PMC: 11657395. DOI: 10.1186/s13059-024-03449-7.


dHICA: a deep transformer-based model enables accurate histone imputation from chromatin accessibility.

Wen W, Zhong J, Zhang Z, Jia L, Chu T, Wang N Brief Bioinform. 2024; 25(6).

PMID: 39316943 PMC: 11421843. DOI: 10.1093/bib/bbae459.


ctGAN: combined transformation of gene expression and survival data with generative adversarial network.

Kim J, Seok J Brief Bioinform. 2024; 25(4).

PMID: 38980369 PMC: 11232285. DOI: 10.1093/bib/bbae325.


Machine Learning to Advance Human Genome-Wide Association Studies.

Sigala R, Lagou V, Shmeliov A, Atito S, Kouchaki S, Awais M Genes (Basel). 2024; 15(1).

PMID: 38254924 PMC: 10815885. DOI: 10.3390/genes15010034.


Early detection of hepatocellular carcinoma via no end-repair enzymatic methylation sequencing of cell-free DNA and pre-trained neural network.

Deng Z, Ji Y, Han B, Tan Z, Ren Y, Gao J Genome Med. 2023; 15(1):93.

PMID: 37936230 PMC: 10631027. DOI: 10.1186/s13073-023-01238-8.


References
1.
Mathelier A, Fornes O, Arenillas D, Chen C, Denay G, Lee J . JASPAR 2016: a major expansion and update of the open-access database of transcription factor binding profiles. Nucleic Acids Res. 2015; 44(D1):D110-5. PMC: 4702842. DOI: 10.1093/nar/gkv1176. View

2.
Eeckhoute J, Carroll J, Geistlinger T, Torres-Arzayus M, Brown M . A cell-type-specific transcriptional network required for estrogen regulation of cyclin D1 and cell cycle progression in breast cancer. Genes Dev. 2006; 20(18):2513-26. PMC: 1578675. DOI: 10.1101/gad.1446006. View

3.
Liu Y, Wang Y, Sun X, Mei C, Wang L, Li Z . miR-449a promotes liver cancer cell apoptosis by downregulation of Calpain 6 and POU2F1. Oncotarget. 2015; 7(12):13491-501. PMC: 4924656. DOI: 10.18632/oncotarget.4821. View

4.
Paul D, Soranzo N, Beck S . Functional interpretation of non-coding sequence variation: concepts and challenges. Bioessays. 2013; 36(2):191-9. PMC: 3992842. DOI: 10.1002/bies.201300126. View

5.
Ward L, Kellis M . HaploReg: a resource for exploring chromatin states, conservation, and regulatory motif alterations within sets of genetically linked variants. Nucleic Acids Res. 2011; 40(Database issue):D930-4. PMC: 3245002. DOI: 10.1093/nar/gkr917. View