» Articles » PMID: 33196774

Combining Artificial Intelligence: Deep Learning with Hi-C Data to Predict the Functional Effects of Non-coding Variants

Overview
Journal Bioinformatics
Specialty Biology
Date 2020 Nov 16
PMID 33196774
Citations 7
Authors
Affiliations
Soon will be listed here.
Abstract

Motivation: Although genome-wide association studies (GWASs) have identified thousands of variants for various traits, the causal variants and the mechanisms underlying the significant loci are largely unknown. In this study, we aim to predict non-coding variants that may functionally affect translation initiation through long-range chromatin interaction.

Results: By incorporating the Hi-C data, we propose a novel and powerful deep learning model of artificial intelligence to classify interacting and non-interacting fragment pairs and predict the functional effects of sequence alteration of single nucleotide on chromatin interaction and thus on gene expression. The changes in chromatin interaction probability between the reference sequence and the altered sequence reflect the degree of functional impact for the variant. The model was effective and efficient with the classification of interacting and non-interacting fragment pairs. The predicted causal SNPs that had a larger impact on chromatin interaction were more likely to be identified by GWAS and eQTL analyses. We demonstrate that an integrative approach combining artificial intelligence-deep learning with high throughput experimental evidence of chromatin interaction leads to prioritizing the functional variants in disease- and phenotype-related loci and thus will greatly expedite uncover of the biological mechanism underlying the association identified in genomic studies.

Availability And Implementation: Source code used in data preparing and model training is available at the GitHub website (https://github.com/biocai/DeepHiC).

Supplementary Information: Supplementary data are available at Bioinformatics online.

Citing Articles

Decoding Non-coding Variants: Recent Approaches to Studying Their Role in Gene Regulation and Human Diseases.

Pena-Martinez E, Rodriguez-Martinez J Front Biosci (Schol Ed). 2024; 16(1):4.

PMID: 38538340 PMC: 11044903. DOI: 10.31083/j.fbs1601004.


Predicting functional consequences of SNPs on mRNA translation via machine learning.

Li Z, Chen L Nucleic Acids Res. 2023; 51(15):7868-7881.

PMID: 37427781 PMC: 10450169. DOI: 10.1093/nar/gkad576.


Widespread allele-specific topological domains in the human genome are not confined to imprinted gene clusters.

Richer S, Tian Y, Schoenfelder S, Hurst L, Murrell A, Pisignano G Genome Biol. 2023; 24(1):40.

PMID: 36869353 PMC: 9983196. DOI: 10.1186/s13059-023-02876-2.


Scalable approaches for functional analyses of whole-genome sequencing non-coding variants.

Kuksa P, Greenfest-Allen E, Cifello J, Ionita M, Wang H, Nicaretta H Hum Mol Genet. 2022; 31(R1):R62-R72.

PMID: 35943817 PMC: 9585666. DOI: 10.1093/hmg/ddac191.


Recurrent noncoding somatic and germline WT1 variants converge to disrupt MYB binding in acute promyelocytic leukemia.

Song H, Liu Y, Tan Y, Zhang Y, Jin W, Chen L Blood. 2022; 140(10):1132-1144.

PMID: 35653587 PMC: 9461475. DOI: 10.1182/blood.2021014945.


References
1.
Jin Y, Gittelman R, Lu Y, Liu X, Li M, Ling F . Evolution of DNAase I Hypersensitive Sites in MHC Regulatory Regions of Primates. Genetics. 2018; 209(2):579-589. PMC: 5972428. DOI: 10.1534/genetics.118.301028. View

2.
Yu J, Mao C, Zhang H, Zhang Q, Wu Z, Yu N . Genetic association of rs11610206 SNP on chromosome 12q13 with late-onset Alzheimer's disease in a Han Chinese population. Clin Chim Acta. 2010; 412(1-2):148-51. DOI: 10.1016/j.cca.2010.09.024. View

3.
Bolger A, Lohse M, Usadel B . Trimmomatic: a flexible trimmer for Illumina sequence data. Bioinformatics. 2014; 30(15):2114-20. PMC: 4103590. DOI: 10.1093/bioinformatics/btu170. View

4.
Liao Y, Smyth G, Shi W . The Subread aligner: fast, accurate and scalable read mapping by seed-and-vote. Nucleic Acids Res. 2013; 41(10):e108. PMC: 3664803. DOI: 10.1093/nar/gkt214. View

5.
Li M, Sham P, Wang J . FastPval: a fast and memory efficient program to calculate very low P-values from empirical distribution. Bioinformatics. 2010; 26(22):2897-9. PMC: 2971576. DOI: 10.1093/bioinformatics/btq540. View