» Articles » PMID: 37587790

NCAE: Data-driven Representations Using a Deep Network-coherent DNA Methylation Autoencoder Identify Robust Disease and Risk Factor Signatures

Overview
Journal Brief Bioinform
Specialty Biology
Date 2023 Aug 17
PMID 37587790
Authors
Affiliations
Soon will be listed here.
Abstract

Precision medicine relies on the identification of robust disease and risk factor signatures from omics data. However, current knowledge-driven approaches may overlook novel or unexpected phenomena due to the inherent biases in biological knowledge. In this study, we present a data-driven signature discovery workflow for DNA methylation analysis utilizing network-coherent autoencoders (NCAEs) with biologically relevant latent embeddings. First, we explored the architecture space of autoencoders trained on a large-scale pan-tissue compendium (n = 75 272) of human epigenome-wide association studies. We observed the emergence of co-localized patterns in the deep autoencoder latent space representations that corresponded to biological network modules. We determined the NCAE configuration with the strongest co-localization and centrality signals in the human protein interactome. Leveraging the NCAE embeddings, we then trained interpretable deep neural networks for risk factor (aging, smoking) and disease (systemic lupus erythematosus) prediction and classification tasks. Remarkably, our NCAE embedding-based models outperformed existing predictors, revealing novel DNA methylation signatures enriched in gene sets and pathways associated with the studied condition in each case. Our data-driven biomarker discovery workflow provides a generally applicable pipeline to capture relevant risk factor and disease information. By surpassing the limitations of knowledge-driven methods, our approach enhances the understanding of complex epigenetic processes, facilitating the development of more effective diagnostic and therapeutic strategies.

Citing Articles

Precise and interpretable neural networks reveal epigenetic signatures of aging across youth in health and disease.

Martinez-Enguita D, Hillerton T, Akesson J, Kling D, Lerm M, Gustafsson M Front Aging. 2025; 5:1526146.

PMID: 39916723 PMC: 11799293. DOI: 10.3389/fragi.2024.1526146.

References
1.
Zhang X, Wang X, Shivashankar G, Uhler C . Graph-based autoencoder integrates spatial transcriptomics with chromatin images and identifies joint biomarkers for Alzheimer's disease. Nat Commun. 2022; 13(1):7480. PMC: 9719477. DOI: 10.1038/s41467-022-35233-1. View

2.
Xiong Z, Li M, Yang F, Ma Y, Sang J, Li R . EWAS Data Hub: a resource of DNA methylation array data and metadata. Nucleic Acids Res. 2019; 48(D1):D890-D895. PMC: 6943079. DOI: 10.1093/nar/gkz840. View

3.
Choi Y, Li R, Quon G . siVAE: interpretable deep generative models for single-cell transcriptomes. Genome Biol. 2023; 24(1):29. PMC: 9940350. DOI: 10.1186/s13059-023-02850-y. View

4.
Hakes L, Pinney J, Robertson D, Lovell S . Protein-protein interaction networks and biology--what's the connection?. Nat Biotechnol. 2008; 26(1):69-72. DOI: 10.1038/nbt0108-69. View

5.
Hedrich C, Mabert K, Rauen T, Tsokos G . DNA methylation in systemic lupus erythematosus. Epigenomics. 2016; 9(4):505-525. PMC: 6040049. DOI: 10.2217/epi-2016-0096. View