» Articles » PMID: 39735654

Advancing Regulatory Genomics With Machine Learning

Overview
Publisher Sage Publications
Specialty Biology
Date 2024 Dec 30
PMID 39735654
Authors
Affiliations
Soon will be listed here.
Abstract

In recent years, several machine learning (ML) approaches have been proposed to predict gene expression signal and chromatin features from the DNA sequence alone. These models are often used to deduce and, to some extent, assess putative new biological insights about gene regulation, and they have led to very interesting advances in regulatory genomics. This article reviews a selection of these methods, ranging from linear models to random forests, kernel methods, and more advanced deep learning models. Specifically, we detail the different techniques and strategies that can be used to extract new gene-regulation hypotheses from these models. Furthermore, because these putative insights need to be validated with wet-lab experiments, we emphasize that it is important to have a measure of confidence associated with the extracted hypotheses. We review the procedures that have been proposed to measure this confidence for the different types of ML models, and we discuss the fact that they do not provide the same kind of information.

References
1.
Koo P, Majdandzic A, Ploenzke M, Anand P, Paul S . Global importance analysis: An interpretability method to quantify importance of genomic features in deep neural networks. PLoS Comput Biol. 2021; 17(5):e1008925. PMC: 8118286. DOI: 10.1371/journal.pcbi.1008925. View

2.
Romero R, Menichelli C, Vroland C, Marin J, Lebre S, Lecellier C . TFscope: systematic analysis of the sequence features involved in the binding preferences of transcription factors. Genome Biol. 2024; 25(1):187. PMC: 11514967. DOI: 10.1186/s13059-024-03321-8. View

3.
Sasse A, Ng B, Spiro A, Tasaki S, Bennett D, Gaiteri C . Benchmarking of deep neural networks for predicting personal gene expression from DNA sequence highlights shortcomings. Nat Genet. 2023; 55(12):2060-2064. DOI: 10.1038/s41588-023-01524-6. View

4.
Vandel J, Cassan O, Lebre S, Lecellier C, Brehelin L . Probing transcription factor combinatorics in different promoter classes and in enhancers. BMC Genomics. 2019; 20(1):103. PMC: 6359851. DOI: 10.1186/s12864-018-5408-0. View

5.
Whitaker J, Chen Z, Wang W . Predicting the human epigenome from DNA motifs. Nat Methods. 2014; 12(3):265-72, 7 p following 272. PMC: 4344378. DOI: 10.1038/nmeth.3065. View